[jira] [Created] (YARN-2229) Making ContainerId long type
Tsuyoshi OZAWA created YARN-2229: Summary: Making ContainerId long type Key: YARN-2229 URL: https://issues.apache.org/jira/browse/YARN-2229 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA On YARN-2052, we changed containerId format: upper 10 bits are for epoch, lower 22 bits are for sequence number of Ids. This is for preserving semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM restarts 1024 times. To avoid the problem, its better to make containerId long. We need to define the new format of container Id with preserving backward compatibility on this JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046722#comment-14046722 ] Hudson commented on YARN-2201: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5794 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5794/]) YARN-2201. Made TestRMWebServicesAppsModification be independent of the changes on yarn-default.xml. Contributed by Varun Vasudev. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606285) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java > TestRMWebServicesAppsModification dependent on yarn-default.xml > --- > > Key: YARN-2201 > URL: https://issues.apache.org/jira/browse/YARN-2201 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ray Chiang >Assignee: Varun Vasudev > Labels: test > Fix For: 2.5.0 > > Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, > apache-yarn-2201.2.patch, apache-yarn-2201.3.patch > > > TestRMWebServicesAppsModification.java has some errors that are > yarn-default.xml dependent. By changing yarn-default.xml properties, I'm > seeing the following errors: > 1) Changing yarn.resourcemanager.scheduler.class from > capacity.CapacityScheduler to fair.FairScheduler gives the error: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 3.22 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > 2) Changing yarn.acl.enable from false to true results in the following > errors: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.986 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287) > testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.258 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369) > testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.263 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 0.214 sec <<< FAILURE! > java.lang.A
[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046721#comment-14046721 ] Zhijie Shen commented on YARN-2201: --- Committed to trunk, branch-2. Thanks Varun for the patch, and Ray for review! > TestRMWebServicesAppsModification dependent on yarn-default.xml > --- > > Key: YARN-2201 > URL: https://issues.apache.org/jira/browse/YARN-2201 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ray Chiang >Assignee: Varun Vasudev > Labels: test > Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, > apache-yarn-2201.2.patch, apache-yarn-2201.3.patch > > > TestRMWebServicesAppsModification.java has some errors that are > yarn-default.xml dependent. By changing yarn-default.xml properties, I'm > seeing the following errors: > 1) Changing yarn.resourcemanager.scheduler.class from > capacity.CapacityScheduler to fair.FairScheduler gives the error: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 3.22 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > 2) Changing yarn.acl.enable from false to true results in the following > errors: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.986 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287) > testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.258 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369) > testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.263 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 0.214 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidId(TestRMWebServicesAppsModification.java:482) > I'm opening this JIRA as a discussion for the best way to fix this. I've got > a few ideas, but I wo
[jira] [Created] (YARN-2228) TimelineServer should load pseudo authentication filter when authentication = simple
Zhijie Shen created YARN-2228: - Summary: TimelineServer should load pseudo authentication filter when authentication = simple Key: YARN-2228 URL: https://issues.apache.org/jira/browse/YARN-2228 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen When kerberos authentication is not enabled, we should let the timeline server to work with pseudo authentication filter. In this way, the sever is able to detect the request user by checking "user.name". On the other hand, timeline client should append "user.name" in un-secure case as well, such that ACLs can keep working in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046707#comment-14046707 ] Tsuyoshi OZAWA commented on YARN-2052: -- The test failure is not related. > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, > YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, > YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046703#comment-14046703 ] Hadoop QA commented on YARN-2052: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652938/YARN-2052.11.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4129//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4129//console This message is automatically generated. > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, > YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, > YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046647#comment-14046647 ] Hadoop QA commented on YARN-614: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652934/YARN-614.13.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4128//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4128//console This message is automatically generated. > Separate AM failures from hardware failure or YARN error and do not count > them to AM retry count > > > Key: YARN-614 > URL: https://issues.apache.org/jira/browse/YARN-614 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Xuan Gong > Fix For: 2.5.0 > > Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, > YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, > YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, > YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch > > > Attempts can fail due to a large number of user errors and they should not be > retried unnecessarily. The only reason YARN should retry an attempt is when > the hardware fails or YARN has an error. NM failing, lost NM and NM disk > errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2052: - Attachment: YARN-2052.11.patch Updated a patch to address the comments: * Bumped up the version of FileSystemRMStateStore. * Refactored {{getAndIncrement}} of FileSystemStateStore/ZKRMStateStore to remove duplicated check of the epoch znode/file. * Renamed RMEpoch.java to Epoch.java and RMEpochPBImpl.java to EpochPBImpl.java. For the consistency, updated the file/znode name of EPOCH_NODE from "RMEpochNode" to "EpochNode". > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, > YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, > YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046590#comment-14046590 ] Wangda Tan commented on YARN-1408: -- Hi [~sunilg], Thanks for working out this patch so fast! *A major problems I've seen.* ResourceRequest stored in RMContainerImpl should include rack/any RR, Currently, there's only one ResourceRequest stored in RMContainerImpl, this may should not enough for recovering in following cases: Case 1: RR may contain other fields like relaxLocality, etc. Assume a RR is node-local, the relaxLocaity=true (default), and it's rack-local/any RR's relaxLocality=false. In your current implementation, you cannot fully recover original RRs. Case 2: Rack-local RR will be missing. Assume a RR is node-local, when do resource allocation, the outstanding rack-local/any numContainer will be decreased, you can check AppSchedulingInfo#allocateNodeLocal for the logic of how outstanding rack/any #containers decreased. *My thoughts about how to implement this is:* In FiCaScheduler#allocate, appSchedulingInfo.allocate will be invoked. You can edit appSchedulingInfo.allocate to return a list a RRs, include node/rack/any if possible. Pass such RRs to RMContainerImpl And could you please elaborate on this? bq. AM would have asked for NodeLocal in another Hosts, which may not be able to recover. Does it make sense to you? I'll review minor issues and test cases in next cycle. Thanks, Wangda > Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task > timeout for 30mins > -- > > Key: YARN-1408 > URL: https://issues.apache.org/jira/browse/YARN-1408 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, > Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch > > > Capacity preemption is enabled as follows. > * yarn.resourcemanager.scheduler.monitor.enable= true , > * > yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy > Queue = a,b > Capacity of Queue A = 80% > Capacity of Queue B = 20% > Step 1: Assign a big jobA on queue a which uses full cluster capacity > Step 2: Submitted a jobB to queue b which would use less than 20% of cluster > capacity > JobA task which uses queue b capcity is been preempted and killed. > This caused below problem: > 1. New Container has got allocated for jobA in Queue A as per node update > from an NM. > 2. This container has been preempted immediately as per preemption. > Here ACQUIRED at KILLED Invalid State exception came when the next AM > heartbeat reached RM. > ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ACQUIRED at KILLED > This also caused the Task to go for a timeout for 30minutes as this Container > was already killed by preemption. > attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-614: - Attachment: YARN-614.13.patch renamed a unit test name > Separate AM failures from hardware failure or YARN error and do not count > them to AM retry count > > > Key: YARN-614 > URL: https://issues.apache.org/jira/browse/YARN-614 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Xuan Gong > Fix For: 2.5.0 > > Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, > YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, > YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, > YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch > > > Attempts can fail due to a large number of user errors and they should not be > retried unnecessarily. The only reason YARN should retry an attempt is when > the hardware fails or YARN has an error. NM failing, lost NM and NM disk > errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046572#comment-14046572 ] Jian He commented on YARN-614: -- +1 > Separate AM failures from hardware failure or YARN error and do not count > them to AM retry count > > > Key: YARN-614 > URL: https://issues.apache.org/jira/browse/YARN-614 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Xuan Gong > Fix For: 2.5.0 > > Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, > YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, > YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.7.patch, > YARN-614.8.patch, YARN-614.9.patch > > > Attempts can fail due to a large number of user errors and they should not be > retried unnecessarily. The only reason YARN should retry an attempt is when > the hardware fails or YARN has an error. NM failing, lost NM and NM disk > errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046566#comment-14046566 ] Vinod Kumar Vavilapalli commented on YARN-2225: --- It breaks compatibility w.r.t behavior - asking existing users who care about it to turn it on explicitly. bq. In spirit, virtual memory check has been a pain and we end up recommending users to turn it off. I have had a different experience. It indeed is a pain for testing both in Hadoop and in high level frameworks, but it's been invaluable in real life clusters to thwart run away jobs - specifically the non-java ones - from affecting the cluster. > Turn the virtual memory check to be off by default > -- > > Key: YARN-2225 > URL: https://issues.apache.org/jira/browse/YARN-2225 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2225.patch > > > The virtual memory check may not be the best way to isolate applications. > Virtual memory is not the constrained resource. It would be better if we > limit the swapping of the task using swapiness instead. This patch will turn > this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if > they need to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2227) Move containerMgrProxy from RM's AMLaunch to get rid of issues that new client talking with old server
Junping Du created YARN-2227: Summary: Move containerMgrProxy from RM's AMLaunch to get rid of issues that new client talking with old server Key: YARN-2227 URL: https://issues.apache.org/jira/browse/YARN-2227 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Junping Du Assignee: Junping Du In rolling upgrade semantics, we should handle cases that old client should talk with new servers if only compatible changes happen in RPC protocol. In this semantics, there is no guarantee that new client should able to talk with old server which need us to pay specially attention on upgrading sequence. Even this, we will find that it is still hard to deal with NM talk with RM as there are both client and server at both side: in regular heartbeat, NM is client and RM is server; when RM launch AM client, it go through containerMgrProxy and RM is client while NM is server in this case. We should get rid of this situation, i.e. by removing containerMgrProxy in RM and use other way to launch container. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2227) Move containerMgrProxy from RM's AMLaunch to get rid of issues that new client talking with old server
[ https://issues.apache.org/jira/browse/YARN-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2227: - Issue Type: Sub-task (was: Improvement) Parent: YARN-666 > Move containerMgrProxy from RM's AMLaunch to get rid of issues that new > client talking with old server > -- > > Key: YARN-2227 > URL: https://issues.apache.org/jira/browse/YARN-2227 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Junping Du > > In rolling upgrade semantics, we should handle cases that old client should > talk with new servers if only compatible changes happen in RPC protocol. In > this semantics, there is no guarantee that new client should able to talk > with old server which need us to pay specially attention on upgrading > sequence. Even this, we will find that it is still hard to deal with NM talk > with RM as there are both client and server at both side: in regular > heartbeat, NM is client and RM is server; when RM launch AM client, it go > through containerMgrProxy and RM is client while NM is server in this case. > We should get rid of this situation, i.e. by removing containerMgrProxy in RM > and use other way to launch container. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed
[ https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046560#comment-14046560 ] Wangda Tan commented on YARN-2104: -- Thanks [~maysamyabandeh] and [~jlowe] for review and commit! > Scheduler queue filter failed to work because index of queue column changed > --- > > Key: YARN-2104 > URL: https://issues.apache.org/jira/browse/YARN-2104 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 3.0.0, 2.5.0 > > Attachments: YARN-2104.patch > > > YARN-563 added, > {code} > + th(".type", "Application Type”). > {code} > to application table, which makes queue’s column index from 3 to 4. And in > scheduler page, queue’s column index is hard coded to 3 when filter > application with queue’s name, > {code} > "if (q == 'root') q = '';", > "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';", > "$('#apps').dataTable().fnFilter(q, 3, true);", > {code} > So queue filter will not work for application page. > Reproduce steps: (Thanks Bo Yang for pointing this) > {code} > 1) In default setup, there’s a default queue under root queue > 2) Run an arbitrary application, you can find it in “Applications” page > 3) Click “Default” queue in scheduler page > 4) Click “Applications”, no application will show here > 5) Click “Root” queue in scheduler page > 6) Click “Applications”, application will show again > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2052: - Attachment: (was: YARN-2052.11.patch) > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, > YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, > YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046557#comment-14046557 ] Jian He commented on YARN-2052: --- can you rename RMEpoch.java to Epoch and similar RMEpochPBimpl too ? > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, > YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, > YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2052: - Attachment: YARN-2052.11.patch > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, > YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, > YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046544#comment-14046544 ] Hadoop QA commented on YARN-2052: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652917/YARN-2052.10.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4127//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4127//console This message is automatically generated. > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, > YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, > YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2226) RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores
[ https://issues.apache.org/jira/browse/YARN-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046531#comment-14046531 ] Jian He commented on YARN-2226: --- Actually, FileSystem and ZK state store has separate version because they might at some point diverge. close this as invalid > RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores > - > > Key: YARN-2226 > URL: https://issues.apache.org/jira/browse/YARN-2226 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli > Labels: newbie > > We need all state store impls to be versioned. Should move > ZKRMStateStore.CURRENT_VERSION_INFO to RMStateStore so that versioning > applies to all stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2226) RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores
[ https://issues.apache.org/jira/browse/YARN-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-2226. --- Resolution: Invalid > RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores > - > > Key: YARN-2226 > URL: https://issues.apache.org/jira/browse/YARN-2226 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli > Labels: newbie > > We need all state store impls to be versioned. Should move > ZKRMStateStore.CURRENT_VERSION_INFO to RMStateStore so that versioning > applies to all stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046530#comment-14046530 ] Jian He commented on YARN-2052: --- - Actually, FileSystem and ZK state store has separate version because they might at some point diverge, we should bump up filesystem version too in this patch. - These two calls are duplicated in getAndIncrement of FileSystemStateStore/ZKRMStateStore, we can consolidate into one, “fs.exists(epochNodePath)/ existsWithRetries(epochNodePath, true) != null;” > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, > YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, > YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed
[ https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046528#comment-14046528 ] Hudson commented on YARN-2104: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5792 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5792/]) YARN-2104. Scheduler queue filter failed to work because index of queue column changed. Contributed by Wangda Tan (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606265) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java > Scheduler queue filter failed to work because index of queue column changed > --- > > Key: YARN-2104 > URL: https://issues.apache.org/jira/browse/YARN-2104 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 3.0.0, 2.5.0 > > Attachments: YARN-2104.patch > > > YARN-563 added, > {code} > + th(".type", "Application Type”). > {code} > to application table, which makes queue’s column index from 3 to 4. And in > scheduler page, queue’s column index is hard coded to 3 when filter > application with queue’s name, > {code} > "if (q == 'root') q = '';", > "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';", > "$('#apps').dataTable().fnFilter(q, 3, true);", > {code} > So queue filter will not work for application page. > Reproduce steps: (Thanks Bo Yang for pointing this) > {code} > 1) In default setup, there’s a default queue under root queue > 2) Run an arbitrary application, you can find it in “Applications” page > 3) Click “Default” queue in scheduler page > 4) Click “Applications”, no application will show here > 5) Click “Root” queue in scheduler page > 6) Click “Applications”, application will show again > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed
[ https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046512#comment-14046512 ] Jason Lowe commented on YARN-2104: -- +1 lgtm. The test failure is unrelated. Committing this. > Scheduler queue filter failed to work because index of queue column changed > --- > > Key: YARN-2104 > URL: https://issues.apache.org/jira/browse/YARN-2104 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2104.patch > > > YARN-563 added, > {code} > + th(".type", "Application Type”). > {code} > to application table, which makes queue’s column index from 3 to 4. And in > scheduler page, queue’s column index is hard coded to 3 when filter > application with queue’s name, > {code} > "if (q == 'root') q = '';", > "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';", > "$('#apps').dataTable().fnFilter(q, 3, true);", > {code} > So queue filter will not work for application page. > Reproduce steps: (Thanks Bo Yang for pointing this) > {code} > 1) In default setup, there’s a default queue under root queue > 2) Run an arbitrary application, you can find it in “Applications” page > 3) Click “Default” queue in scheduler page > 4) Click “Applications”, no application will show here > 5) Click “Root” queue in scheduler page > 6) Click “Applications”, application will show again > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2052: - Attachment: YARN-2052.10.patch [~jianhe], good catch. Updated MemoryRMStateStore and its tests. [~vinodkv], yes, let's do this on YARN-2226. > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.10.patch, > YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, > YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, > YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2226) RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores
[ https://issues.apache.org/jira/browse/YARN-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2226: -- Assignee: (was: Vinod Kumar Vavilapalli) Labels: newbie (was: ) > RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores > - > > Key: YARN-2226 > URL: https://issues.apache.org/jira/browse/YARN-2226 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli > Labels: newbie > > We need all state store impls to be versioned. Should move > ZKRMStateStore.CURRENT_VERSION_INFO to RMStateStore so that versioning > applies to all stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046504#comment-14046504 ] Vinod Kumar Vavilapalli commented on YARN-2052: --- Not related to this patch, but I think CURRENT_VERSION_INFO shouldn't be in ZKRMStateStore. Filed YARN-2226. > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, > YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, > YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2226) RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores
Vinod Kumar Vavilapalli created YARN-2226: - Summary: RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores Key: YARN-2226 URL: https://issues.apache.org/jira/browse/YARN-2226 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli We need all state store impls to be versioned. Should move ZKRMStateStore.CURRENT_VERSION_INFO to RMStateStore so that versioning applies to all stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046494#comment-14046494 ] Karthik Kambatla edited comment on YARN-2225 at 6/27/14 10:48 PM: -- According to our compatibility guide, "The default values of Hadoop-defined properties can be changed across minor/major releases, but will remain the same across point releases within a minor release." So, in letter, we can't target 2.4.1 or 2.5.1, but can target 2.5 or 2.6. In spirit, virtual memory check has been a pain and we end up recommending users to turn it off. was (Author: kkambatl): According to your compatibility guide, "The default values of Hadoop-defined properties can be changed across minor/major releases, but will remain the same across point releases within a minor release." So, in letter, we can't target 2.4.1 or 2.5.1, but can target 2.5 or 2.6. In spirit, virtual memory check has been a pain and we end up recommending users to turn it off. > Turn the virtual memory check to be off by default > -- > > Key: YARN-2225 > URL: https://issues.apache.org/jira/browse/YARN-2225 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2225.patch > > > The virtual memory check may not be the best way to isolate applications. > Virtual memory is not the constrained resource. It would be better if we > limit the swapping of the task using swapiness instead. This patch will turn > this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if > they need to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046494#comment-14046494 ] Karthik Kambatla commented on YARN-2225: According to your compatibility guide, "The default values of Hadoop-defined properties can be changed across minor/major releases, but will remain the same across point releases within a minor release." So, in letter, we can't target 2.4.1 or 2.5.1, but can target 2.5 or 2.6. In spirit, virtual memory check has been a pain and we end up recommending users to turn it off. > Turn the virtual memory check to be off by default > -- > > Key: YARN-2225 > URL: https://issues.apache.org/jira/browse/YARN-2225 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2225.patch > > > The virtual memory check may not be the best way to isolate applications. > Virtual memory is not the constrained resource. It would be better if we > limit the swapping of the task using swapiness instead. This patch will turn > this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if > they need to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046489#comment-14046489 ] Vinod Kumar Vavilapalli commented on YARN-2225: --- -1 for changing the default.. This breaks compatibility. bq. The virtual memory check may not be the best way to isolate applications. Virtual memory is not the constrained resource. I still see a lot of apps that needs isolation w.r.t vmem. It's not about which resource is constrained, it is about isolation. We already identify physical memory as constrained and use that as the main scheduling dimension. > Turn the virtual memory check to be off by default > -- > > Key: YARN-2225 > URL: https://issues.apache.org/jira/browse/YARN-2225 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2225.patch > > > The virtual memory check may not be the best way to isolate applications. > Virtual memory is not the constrained resource. It would be better if we > limit the swapping of the task using swapiness instead. This patch will turn > this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if > they need to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046480#comment-14046480 ] Hadoop QA commented on YARN-2225: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652908/YARN-2225.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4126//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4126//console This message is automatically generated. > Turn the virtual memory check to be off by default > -- > > Key: YARN-2225 > URL: https://issues.apache.org/jira/browse/YARN-2225 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2225.patch > > > The virtual memory check may not be the best way to isolate applications. > Virtual memory is not the constrained resource. It would be better if we > limit the swapping of the task using swapiness instead. This patch will turn > this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if > they need to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings
[ https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046469#comment-14046469 ] Hadoop QA commented on YARN-2224: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652903/YARN-2224.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4125//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4125//console This message is automatically generated. > Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective > of the default settings > - > > Key: YARN-2224 > URL: https://issues.apache.org/jira/browse/YARN-2224 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2224.patch > > > If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test > will fail. Make the test pass not rely on the default settings but just let > it verify that once the setting is turned on it actually does the memory > check. See YARN-2225 which suggests we turn the default off. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2225: Attachment: YARN-2225.patch > Turn the virtual memory check to be off by default > -- > > Key: YARN-2225 > URL: https://issues.apache.org/jira/browse/YARN-2225 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2225.patch > > > The virtual memory check may not be the best way to isolate applications. > Virtual memory is not the constrained resource. It would be better if we > limit the swapping of the task using swapiness instead. This patch will turn > this off by default and let users turn it on if they need to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-2225: --- Assignee: Anubhav Dhoot > Turn the virtual memory check to be off by default > -- > > Key: YARN-2225 > URL: https://issues.apache.org/jira/browse/YARN-2225 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2225.patch > > > The virtual memory check may not be the best way to isolate applications. > Virtual memory is not the constrained resource. It would be better if we > limit the swapping of the task using swapiness instead. This patch will turn > this off by default and let users turn it on if they need to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2225: Description: The virtual memory check may not be the best way to isolate applications. Virtual memory is not the constrained resource. It would be better if we limit the swapping of the task using swapiness instead. This patch will turn this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if they need to. (was: The virtual memory check may not be the best way to isolate applications. Virtual memory is not the constrained resource. It would be better if we limit the swapping of the task using swapiness instead. This patch will turn this off by default and let users turn it on if they need to.) > Turn the virtual memory check to be off by default > -- > > Key: YARN-2225 > URL: https://issues.apache.org/jira/browse/YARN-2225 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2225.patch > > > The virtual memory check may not be the best way to isolate applications. > Virtual memory is not the constrained resource. It would be better if we > limit the swapping of the task using swapiness instead. This patch will turn > this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if > they need to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings
[ https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046446#comment-14046446 ] Anubhav Dhoot commented on YARN-2224: - Once the test is made resilient, we can decide in YARN-2225 to turn the defaults off > Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective > of the default settings > - > > Key: YARN-2224 > URL: https://issues.apache.org/jira/browse/YARN-2224 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2224.patch > > > If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test > will fail. Make the test pass not rely on the default settings but just let > it verify that once the setting is turned on it actually does the memory > check. See YARN-2225 which suggests we turn the default off. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2225) Turn the virtual memory check to be off by default
Anubhav Dhoot created YARN-2225: --- Summary: Turn the virtual memory check to be off by default Key: YARN-2225 URL: https://issues.apache.org/jira/browse/YARN-2225 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot The virtual memory check may not be the best way to isolate applications. Virtual memory is not the constrained resource. It would be better if we limit the swapping of the task using swapiness instead. This patch will turn this off by default and let users turn it on if they need to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings
[ https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2224: Description: If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test will fail. Make the test pass not rely on the default settings but just let it verify that once the setting is turned on it actually does the memory check. See YARN-2225 which suggests we turn the default off. (was: If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test will fail. Make the test pass not rely on the default settings but just let it verify that once the setting is turned on it actually does the memory check. ) > Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective > of the default settings > - > > Key: YARN-2224 > URL: https://issues.apache.org/jira/browse/YARN-2224 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2224.patch > > > If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test > will fail. Make the test pass not rely on the default settings but just let > it verify that once the setting is turned on it actually does the memory > check. See YARN-2225 which suggests we turn the default off. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings
[ https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2224: Attachment: YARN-2224.patch Sets the flag to be true so that the test does not fail if the default was set to false. > Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective > of the default settings > - > > Key: YARN-2224 > URL: https://issues.apache.org/jira/browse/YARN-2224 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2224.patch > > > If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test > will fail. Make the test pass not rely on the default settings but just let > it verify that once the setting is turned on it actually does the memory > check. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings
Anubhav Dhoot created YARN-2224: --- Summary: Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings Key: YARN-2224 URL: https://issues.apache.org/jira/browse/YARN-2224 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test will fail. Make the test pass not rely on the default settings but just let it verify that once the setting is turned on it actually does the memory check. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed
[ https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046425#comment-14046425 ] Maysam Yabandeh commented on YARN-2104: --- +1 Worked for us. And the failed unit test seems irrelevant. > Scheduler queue filter failed to work because index of queue column changed > --- > > Key: YARN-2104 > URL: https://issues.apache.org/jira/browse/YARN-2104 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2104.patch > > > YARN-563 added, > {code} > + th(".type", "Application Type”). > {code} > to application table, which makes queue’s column index from 3 to 4. And in > scheduler page, queue’s column index is hard coded to 3 when filter > application with queue’s name, > {code} > "if (q == 'root') q = '';", > "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';", > "$('#apps').dataTable().fnFilter(q, 3, true);", > {code} > So queue filter will not work for application page. > Reproduce steps: (Thanks Bo Yang for pointing this) > {code} > 1) In default setup, there’s a default queue under root queue > 2) Run an arbitrary application, you can find it in “Applications” page > 3) Click “Default” queue in scheduler page > 4) Click “Applications”, no application will show here > 5) Click “Root” queue in scheduler page > 6) Click “Applications”, application will show again > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046419#comment-14046419 ] Jian He commented on YARN-2052: --- Patch looks good overall, can you update MemoryStateStore also so that we can test the containerId issued by the new RM is correctly ? thx {code} -assertEquals(4, schedulerAttempt.getNewContainerId()); +assertEquals(1, schedulerAttempt.getNewContainerId()); {code} > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, > YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, > YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2223) NPE on ResourceManager recover
[ https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Bringhurst updated YARN-2223: - Description: I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461). Both clusters have the same config (other than hostnames). Both are running on JDK8u5 (I'm not sure if this is a factor here). One cluster started up without any errors. The other started up with the following error on the RM: {noformat} 18:33:45,463 WARN RMAppImpl:331 - The specific max attempts: 0 for application: 1 is invalid, because it is out of the range [1, 50]. Use the global max attempts instead. 18:33:45,465 INFO RMAppImpl:651 - Recovering app: application_1398450350082_0001 with 8 attempts and final state = KILLED 18:33:45,468 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_01 with final state: KILLED 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_02 with final state: FAILED 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_03 with final state: FAILED 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_04 with final state: FAILED 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_05 with final state: FAILED 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_06 with final state: FAILED 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_07 with final state: FAILED 18:33:45,481 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_08 with final state: FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_01 State change from NEW to KILLED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_02 State change from NEW to FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_03 State change from NEW to FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_04 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_05 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_06 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_07 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_08 State change from NEW to FAILED 18:33:45,485 INFO RMAppImpl:639 - application_1398450350082_0001 State change from NEW to KILLED 18:33:45,485 WARN RMAppImpl:331 - The specific max attempts: 0 for application: 2 is invalid, because it is out of the range [1, 50]. Use the global max attempts instead. 18:33:45,485 INFO RMAppImpl:651 - Recovering app: application_1398450350082_0002 with 8 attempts and final state = KILLED 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_01 with final state: KILLED 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_02 with final state: FAILED 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_03 with final state: FAILED 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_04 with final state: FAILED 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_05 with final state: FAILED 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_06 with final state: FAILED 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_07 with final state: FAILED 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_08 with final state: FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_01 State change from NEW to KILLED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_02 State change from NEW to FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_03 State change from NEW to FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_04 State change from NEW to FAILED 18:33:45,491 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_05 State change from NEW to FAILED 18:33:45,491 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_06 State change from NEW to FAILED 18:33:45,491 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_07 State change from NEW to FAILED 18:
[jira] [Created] (YARN-2223) NPE on ResourceManager recover
Jon Bringhurst created YARN-2223: Summary: NPE on ResourceManager recover Key: YARN-2223 URL: https://issues.apache.org/jira/browse/YARN-2223 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1 Reporter: Jon Bringhurst I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461). Both clusters have the same config (other than hostnames). Both are running on JDK8u5 (I'm not sure if this is a factor here). One cluster started up without any errors. The other started up with the following error on the RM: {noformat} 18:33:45,463 WARN RMAppImpl:331 - The specific max attempts: 0 for application: 1 is invalid, because it is out of the range [1, 50]. Use the global max attempts instead. 18:33:45,465 INFO RMAppImpl:651 - Recovering app: application_1398450350082_0001 with 8 attempts and final state = KILLED 18:33:45,468 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_01 with final state: KILLED 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_02 with final state: FAILED 18:33:45,478 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_03 with final state: FAILED 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_04 with final state: FAILED 18:33:45,479 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_05 with final state: FAILED 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_06 with final state: FAILED 18:33:45,480 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_07 with final state: FAILED 18:33:45,481 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0001_08 with final state: FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_01 State change from NEW to KILLED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_02 State change from NEW to FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_03 State change from NEW to FAILED 18:33:45,482 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_04 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_05 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_06 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_07 State change from NEW to FAILED 18:33:45,483 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_08 State change from NEW to FAILED 18:33:45,485 INFO RMAppImpl:639 - application_1398450350082_0001 State change from NEW to KILLED 18:33:45,485 WARN RMAppImpl:331 - The specific max attempts: 0 for application: 2 is invalid, because it is out of the range [1, 50]. Use the global max attempts instead. 18:33:45,485 INFO RMAppImpl:651 - Recovering app: application_1398450350082_0002 with 8 attempts and final state = KILLED 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_01 with final state: KILLED 18:33:45,486 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_02 with final state: FAILED 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_03 with final state: FAILED 18:33:45,487 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_04 with final state: FAILED 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_05 with final state: FAILED 18:33:45,488 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_06 with final state: FAILED 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_07 with final state: FAILED 18:33:45,489 INFO RMAppAttemptImpl:691 - Recovering attempt: appattempt_1398450350082_0002_08 with final state: FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_01 State change from NEW to KILLED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_02 State change from NEW to FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_03 State change from NEW to FAILED 18:33:45,490 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_04 State change from NEW to FAILED 18:33:45,491 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_05 State change from NEW to FAILED 18:33:45,491 INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_06 State c
Re: Anyone know how to mock a secured hdfs for unit test?
Hi David and Kai, There are a couple of challenges with this, but I just figured out a pretty decent setup while working on HDFS-2856. That code isn't committed yet, but if you open patch version 5 attached to that issue and look for the TestSaslDataTransfer class, then you'll see how it works. Most of the logic for bootstrapping a MiniKDC and setting up the right HDFS configuration properties is in an abstract base class named SaslDataTransferTestCase. I hope this helps. There are a few other open issues out there related to tests in secure mode. I know of HDFS-4312 and HDFS-5410. It would be great to get more regular test coverage with something that more closely approximates a secured deployment. Chris Nauroth Hortonworks http://hortonworks.com/ On Thu, Jun 26, 2014 at 7:27 AM, Zheng, Kai wrote: > Hi David, > > Quite some time ago I opened HADOOP-9952 and planned to create secured > MiniClusters by making use of MiniKDC. Unfortunately since then I didn't > get the chance to work on it yet. If you need something like that and would > contribute, please let me know and see if anything I can help with. Thanks. > > Regards, > Kai > > -Original Message- > From: Liu, David [mailto:liujion...@gmail.com] > Sent: Thursday, June 26, 2014 10:12 PM > To: hdfs-...@hadoop.apache.org; hdfs-iss...@hadoop.apache.org; > yarn-...@hadoop.apache.org; yarn-issues@hadoop.apache.org; > mapreduce-...@hadoop.apache.org; secur...@hadoop.apache.org > Subject: Anyone know how to mock a secured hdfs for unit test? > > Hi all, > > I need to test my code which read data from secured hdfs, is there any > library to mock secured hdfs, can minihdfscluster do the work? > Any suggestion is appreciated. > > > Thanks > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046350#comment-14046350 ] Hadoop QA commented on YARN-614: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652876/YARN-614.12.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4124//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4124//console This message is automatically generated. > Separate AM failures from hardware failure or YARN error and do not count > them to AM retry count > > > Key: YARN-614 > URL: https://issues.apache.org/jira/browse/YARN-614 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Xuan Gong > Fix For: 2.5.0 > > Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, > YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, > YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.7.patch, > YARN-614.8.patch, YARN-614.9.patch > > > Attempts can fail due to a large number of user errors and they should not be > retried unnecessarily. The only reason YARN should retry an attempt is when > the hardware fails or YARN has an error. NM failing, lost NM and NM disk > errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1713) Implement getnewapplication and submitapp as part of RM web service
[ https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1713: --- Priority: Blocker (was: Major) Target Version/s: 2.5.0 > Implement getnewapplication and submitapp as part of RM web service > --- > > Key: YARN-1713 > URL: https://issues.apache.org/jira/browse/YARN-1713 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Blocker > Attachments: apache-yarn-1713.3.patch, apache-yarn-1713.4.patch, > apache-yarn-1713.5.patch, apache-yarn-1713.6.patch, apache-yarn-1713.7.patch, > apache-yarn-1713.8.patch, apache-yarn-1713.cumulative.2.patch, > apache-yarn-1713.cumulative.3.patch, apache-yarn-1713.cumulative.4.patch, > apache-yarn-1713.cumulative.patch, apache-yarn-1713.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1695) Implement the rest (writable APIs) of RM web-services
[ https://issues.apache.org/jira/browse/YARN-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1695: --- Priority: Blocker (was: Major) > Implement the rest (writable APIs) of RM web-services > - > > Key: YARN-1695 > URL: https://issues.apache.org/jira/browse/YARN-1695 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Varun Vasudev >Priority: Blocker > > MAPREDUCE-2863 added the REST web-services to RM and NM. But all the APIs > added there were only focused on obtaining information from the cluster. We > need to have the following REST APIs to finish the feature > - Application submission/termination (Priority): This unblocks easy client > interaction with a YARN cluster > - Application Client protocol: For resource scheduling by apps written in an > arbitrary language. Will have to think about throughput concerns > - ContainerManagement Protocol: Again for arbitrary language apps. > One important thing to note here is that we already have client libraries on > all the three protocols that do some some heavy-lifting. One part of the > effort is to figure out if they can be made any thinner and/or how > web-services will implement the same functionality. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1373) Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps
[ https://issues.apache.org/jira/browse/YARN-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046328#comment-14046328 ] Vinod Kumar Vavilapalli commented on YARN-1373: --- Since YARN-1210, we always have had the app and app-attempt move to RUNNING state after RM restarts. That's why it is a dup. > Transition RMApp and RMAppAttempt state to RUNNING after restart for > recovered running apps > --- > > Key: YARN-1373 > URL: https://issues.apache.org/jira/browse/YARN-1373 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Omkar Vinit Joshi > > Currently the RM moves recovered app attempts to the a terminal recovered > state and starts a new attempt. Instead, it will have to transition the last > attempt to a running state such that it can proceed as normal once the > running attempt has resynced with the ApplicationMasterService (YARN-1365 and > YARN-1366). If the RM had started the application container before dying then > the AM would be up and trying to contact the RM. The RM may have had died > before launching the container. For this case, the RM should wait for AM > liveliness period and issue a kill container for the stored master container. > It should transition this attempt to some RECOVER_ERROR state and proceed to > start a new attempt. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-614: --- Attachment: YARN-614.12.patch > Separate AM failures from hardware failure or YARN error and do not count > them to AM retry count > > > Key: YARN-614 > URL: https://issues.apache.org/jira/browse/YARN-614 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Xuan Gong > Fix For: 2.5.0 > > Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, > YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, > YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.7.patch, > YARN-614.8.patch, YARN-614.9.patch > > > Attempts can fail due to a large number of user errors and they should not be > retried unnecessarily. The only reason YARN should retry an attempt is when > the hardware fails or YARN has an error. NM failing, lost NM and NM disk > errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046300#comment-14046300 ] Xuan Gong commented on YARN-614: Not sure why this org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions fails, it passed on my local machine. org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter is not related For org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart, it fails because of time-out. I added more logic on the test case, I need to increase the time-out. Submitted new patch to kick the Jenkins again.. > Separate AM failures from hardware failure or YARN error and do not count > them to AM retry count > > > Key: YARN-614 > URL: https://issues.apache.org/jira/browse/YARN-614 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Xuan Gong > Fix For: 2.5.0 > > Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, > YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, > YARN-614.10.patch, YARN-614.11.patch, YARN-614.7.patch, YARN-614.8.patch, > YARN-614.9.patch > > > Attempts can fail due to a large number of user errors and they should not be > retried unnecessarily. The only reason YARN should retry an attempt is when > the hardware fails or YARN has an error. NM failing, lost NM and NM disk > errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046219#comment-14046219 ] Hudson commented on YARN-2204: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5790 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5790/]) YARN-2204. Addendum patch. TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler. (Robert Kanter via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606168) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java > TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler > --- > > Key: YARN-2204 > URL: https://issues.apache.org/jira/browse/YARN-2204 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Trivial > Fix For: 2.5.0 > > Attachments: YARN-2204.patch, YARN-2204_addendum.patch, > YARN-2204_addendum.patch > > > TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046217#comment-14046217 ] Hadoop QA commented on YARN-614: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652862/YARN-614.11.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4123//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4123//console This message is automatically generated. > Separate AM failures from hardware failure or YARN error and do not count > them to AM retry count > > > Key: YARN-614 > URL: https://issues.apache.org/jira/browse/YARN-614 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Xuan Gong > Fix For: 2.5.0 > > Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, > YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, > YARN-614.10.patch, YARN-614.11.patch, YARN-614.7.patch, YARN-614.8.patch, > YARN-614.9.patch > > > Attempts can fail due to a large number of user errors and they should not be > retried unnecessarily. The only reason YARN should retry an attempt is when > the hardware fails or YARN has an error. NM failing, lost NM and NM disk > errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046199#comment-14046199 ] Hadoop QA commented on YARN-1408: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652860/Yarn-1408.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4122//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4122//console This message is automatically generated. > Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task > timeout for 30mins > -- > > Key: YARN-1408 > URL: https://issues.apache.org/jira/browse/YARN-1408 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, > Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch > > > Capacity preemption is enabled as follows. > * yarn.resourcemanager.scheduler.monitor.enable= true , > * > yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy > Queue = a,b > Capacity of Queue A = 80% > Capacity of Queue B = 20% > Step 1: Assign a big jobA on queue a which uses full cluster capacity > Step 2: Submitted a jobB to queue b which would use less than 20% of cluster > capacity > JobA task which uses queue b capcity is been preempted and killed. > This caused below problem: > 1. New Container has got allocated for jobA in Queue A as per node update > from an NM. > 2. This container has been preempted immediately as per preemption. > Here ACQUIRED at KILLED Invalid State exception came when the next AM > heartbeat reached RM. > ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ACQUIRED at KILLED > This also caused the Task to go for a timeout for 30minutes as this Container > was already killed by preemption. > attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046176#comment-14046176 ] Hadoop QA commented on YARN-1366: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652857/YARN-1366.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.impl.TestAMRMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4121//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4121//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4121//console This message is automatically generated. > AM should implement Resync with the ApplicationMasterService instead of > shutting down > - > > Key: YARN-1366 > URL: https://issues.apache.org/jira/browse/YARN-1366 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Rohith > Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, > YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.patch, > YARN-1366.prototype.patch, YARN-1366.prototype.patch > > > The ApplicationMasterService currently sends a resync response to which the > AM responds by shutting down. The AM behavior is expected to change to > calling resyncing with the RM. Resync means resetting the allocate RPC > sequence number to 0 and the AM should send its entire outstanding request to > the RM. Note that if the AM is making its first allocate call to the RM then > things should proceed like normal without needing a resync. The RM will > return all containers that have completed since the RM last synced with the > AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-614: --- Attachment: YARN-614.11.patch Added more testcases > Separate AM failures from hardware failure or YARN error and do not count > them to AM retry count > > > Key: YARN-614 > URL: https://issues.apache.org/jira/browse/YARN-614 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Xuan Gong > Fix For: 2.5.0 > > Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, > YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, > YARN-614.10.patch, YARN-614.11.patch, YARN-614.7.patch, YARN-614.8.patch, > YARN-614.9.patch > > > Attempts can fail due to a large number of user errors and they should not be > retried unnecessarily. The only reason YARN should retry an attempt is when > the hardware fails or YARN has an error. NM failing, lost NM and NM disk > errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-1408: -- Attachment: Yarn-1408.5.patch > Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task > timeout for 30mins > -- > > Key: YARN-1408 > URL: https://issues.apache.org/jira/browse/YARN-1408 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, > Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch > > > Capacity preemption is enabled as follows. > * yarn.resourcemanager.scheduler.monitor.enable= true , > * > yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy > Queue = a,b > Capacity of Queue A = 80% > Capacity of Queue B = 20% > Step 1: Assign a big jobA on queue a which uses full cluster capacity > Step 2: Submitted a jobB to queue b which would use less than 20% of cluster > capacity > JobA task which uses queue b capcity is been preempted and killed. > This caused below problem: > 1. New Container has got allocated for jobA in Queue A as per node update > from an NM. > 2. This container has been preempted immediately as per preemption. > Here ACQUIRED at KILLED Invalid State exception came when the next AM > heartbeat reached RM. > ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ACQUIRED at KILLED > This also caused the Task to go for a timeout for 30minutes as this Container > was already killed by preemption. > attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-1408: -- Attachment: (was: Yarn-1408.5.patch) > Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task > timeout for 30mins > -- > > Key: YARN-1408 > URL: https://issues.apache.org/jira/browse/YARN-1408 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, > Yarn-1408.4.patch, Yarn-1408.patch > > > Capacity preemption is enabled as follows. > * yarn.resourcemanager.scheduler.monitor.enable= true , > * > yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy > Queue = a,b > Capacity of Queue A = 80% > Capacity of Queue B = 20% > Step 1: Assign a big jobA on queue a which uses full cluster capacity > Step 2: Submitted a jobB to queue b which would use less than 20% of cluster > capacity > JobA task which uses queue b capcity is been preempted and killed. > This caused below problem: > 1. New Container has got allocated for jobA in Queue A as per node update > from an NM. > 2. This container has been preempted immediately as per preemption. > Here ACQUIRED at KILLED Invalid State exception came when the next AM > heartbeat reached RM. > ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ACQUIRED at KILLED > This also caused the Task to go for a timeout for 30minutes as this Container > was already killed by preemption. > attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.5.patch I updated the patch for following incremental change. 1. Reregister for AmRMClient if unregister throw ApplicationMasterNotRegisteredException. 2. Unregister will be called only if it is registered. Please review the updated patch > AM should implement Resync with the ApplicationMasterService instead of > shutting down > - > > Key: YARN-1366 > URL: https://issues.apache.org/jira/browse/YARN-1366 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Rohith > Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, > YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.patch, > YARN-1366.prototype.patch, YARN-1366.prototype.patch > > > The ApplicationMasterService currently sends a resync response to which the > AM responds by shutting down. The AM behavior is expected to change to > calling resyncing with the RM. Resync means resetting the allocate RPC > sequence number to 0 and the AM should send its entire outstanding request to > the RM. Note that if the AM is making its first allocate call to the RM then > things should proceed like normal without needing a resync. The RM will > return all containers that have completed since the RM last synced with the > AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046113#comment-14046113 ] Eric Payne commented on YARN-415: - Test failures for TestRMApplicationHistoryWriter predate this patch. > Capture memory utilization at the app-level for chargeback > -- > > Key: YARN-415 > URL: https://issues.apache.org/jira/browse/YARN-415 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp >Assignee: Andrey Klochkov > Attachments: YARN-415--n10.patch, YARN-415--n2.patch, > YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, > YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, > YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, > YARN-415.201406262136.txt, YARN-415.patch > > > For the purpose of chargeback, I'd like to be able to compute the cost of an > application in terms of cluster resource usage. To start out, I'd like to > get the memory utilization of an application. The unit should be MB-seconds > or something similar and, from a chargeback perspective, the memory amount > should be the memory reserved for the application, as even if the app didn't > use all that memory, no one else was able to use it. > (reserved ram for container 1 * lifetime of container 1) + (reserved ram for > container 2 * lifetime of container 2) + ... + (reserved ram for container n > * lifetime of container n) > It'd be nice to have this at the app level instead of the job level because: > 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't > appear on the job history server). > 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). > This new metric should be available both through the RM UI and RM Web > Services REST API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046109#comment-14046109 ] Ray Chiang commented on YARN-2201: -- +1 for the latest patch. The tests are now independent of changes in yarn-default.xml. > TestRMWebServicesAppsModification dependent on yarn-default.xml > --- > > Key: YARN-2201 > URL: https://issues.apache.org/jira/browse/YARN-2201 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ray Chiang >Assignee: Varun Vasudev > Labels: test > Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, > apache-yarn-2201.2.patch, apache-yarn-2201.3.patch > > > TestRMWebServicesAppsModification.java has some errors that are > yarn-default.xml dependent. By changing yarn-default.xml properties, I'm > seeing the following errors: > 1) Changing yarn.resourcemanager.scheduler.class from > capacity.CapacityScheduler to fair.FairScheduler gives the error: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 3.22 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > 2) Changing yarn.acl.enable from false to true results in the following > errors: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.986 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287) > testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.258 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369) > testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.263 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 0.214 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidId(TestRMWebServicesAppsModification.java:482) > I'm opening this JIRA as a discussion for the best way to fix this. I've got > a few ideas,
[jira] [Commented] (YARN-896) Roll up for long-lived services in YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046065#comment-14046065 ] john lilley commented on YARN-896: -- Grreetings! Arun pointed me to this JIRA to see if this could potentially meet our needs. We are an ISV that currently ships a data-quality/integration suite running as a native YARN application. We are finding several use cases that would benefit from being able to manage a per-node persistent service. MapReduce has its “shuffle auxiliary service”, but it isn’t straightforward to add auxiliary services because they cannot be loaded from HDFS, so we’d have to manage the distribution of JARs across nodes (please tell me if I’m wrong here…). This seems to be addressing a lot of the issues around persistent services, and frankly I'm out of my depth in this discussion. But if you all can help me understand if this might help our situation, I'd be happy to have our team put shoulder to the wheel and help advance the development. Please comment our contemplated use case and help me understand if this is the right place to be. Our software doesn't use MapReduce. It is a pure YARN application that is basically a peer to MapReduce. There are a lot of reasons for this decision, but the main one is that we have a large code base that already executes data transformations in a single-server environment, and we wanted to produce a product without rewriting huge swaths of code. Given that, our software takes care of many things usually delegated to MapReduce, including distributed sort/partition (i.e. "the shuffle"). However, MapReduce has a special place in the ecosystem, in that it creates an auxiliary service to handle the distribution of shuffle data to reducers. It doesn't look like third-party apps have an easy time installing aux services. The JARs for any such service must be in Hadoop's classpath on all nodes at startup, creating both a management issue and a trust/security issue. Currently our software places temporary data into HDFS for this purpose, but we've found that HDFS has a huge overhead in terms of performance and file handles, even at low replication. We desire to replace the use of HDFS with a lighter-weight service to manage temp files and distribute their data. > Roll up for long-lived services in YARN > --- > > Key: YARN-896 > URL: https://issues.apache.org/jira/browse/YARN-896 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Robert Joseph Evans > > YARN is intended to be general purpose, but it is missing some features to be > able to truly support long lived applications and long lived containers. > This ticket is intended to > # discuss what is needed to support long lived processes > # track the resulting JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
[ https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046028#comment-14046028 ] Tsuyoshi OZAWA commented on YARN-2034: -- +1(non-binding) > Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect > > > Key: YARN-2034 > URL: https://issues.apache.org/jira/browse/YARN-2034 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Chen He >Priority: Minor > Labels: documentation > Attachments: YARN-2034.patch, YARN-2034.patch > > > The description in yarn-default.xml for > yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per > local directory, but according to the code it's a setting for the entire node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
[ https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046019#comment-14046019 ] Hadoop QA commented on YARN-2034: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652832/YARN-2034.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4120//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4120//console This message is automatically generated. > Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect > > > Key: YARN-2034 > URL: https://issues.apache.org/jira/browse/YARN-2034 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Chen He >Priority: Minor > Labels: documentation > Attachments: YARN-2034.patch, YARN-2034.patch > > > The description in yarn-default.xml for > yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per > local directory, but according to the code it's a setting for the entire node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2222) Helper scirpt: looping tests until it fails
Tsuyoshi OZAWA created YARN-: Summary: Helper scirpt: looping tests until it fails Key: YARN- URL: https://issues.apache.org/jira/browse/YARN- Project: Hadoop YARN Issue Type: Improvement Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Some tests can fail intermittently because of timing bugs. To reproduce the test failure, it's useful to add script which launches specified test until it fails. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
[ https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2034: -- Attachment: YARN-2034.patch resubmit to trigger HadoopQA > Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect > > > Key: YARN-2034 > URL: https://issues.apache.org/jira/browse/YARN-2034 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Chen He >Priority: Minor > Labels: documentation > Attachments: YARN-2034.patch, YARN-2034.patch > > > The description in yarn-default.xml for > yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per > local directory, but according to the code it's a setting for the entire node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
[ https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2034: -- Labels: documentation (was: ) > Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect > > > Key: YARN-2034 > URL: https://issues.apache.org/jira/browse/YARN-2034 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Chen He >Priority: Minor > Labels: documentation > Attachments: YARN-2034.patch > > > The description in yarn-default.xml for > yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per > local directory, but according to the code it's a setting for the entire node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
[ https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045991#comment-14045991 ] Hadoop QA commented on YARN-2034: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12644210/YARN-2034.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4118//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4118//console This message is automatically generated. > Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect > > > Key: YARN-2034 > URL: https://issues.apache.org/jira/browse/YARN-2034 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Chen He >Priority: Minor > Attachments: YARN-2034.patch > > > The description in yarn-default.xml for > yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per > local directory, but according to the code it's a setting for the entire node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2178) TestApplicationMasterService sometimes fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045974#comment-14045974 ] Tsuyoshi OZAWA commented on YARN-2178: -- [~mitdesai] [~ted_yu] FYI: I use this bash script to reproduce timing bugs: https://github.com/oza/failchecker {code} $ ./failchecker TestApplicationMasterService {code} This scripts run specified tests iteratively until it fails. > TestApplicationMasterService sometimes fails in trunk > - > > Key: YARN-2178 > URL: https://issues.apache.org/jira/browse/YARN-2178 > Project: Hadoop YARN > Issue Type: Test >Reporter: Ted Yu >Priority: Minor > Labels: test > > From https://builds.apache.org/job/Hadoop-Yarn-trunk/587/ : > {code} > Running > org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService > Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 55.763 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService > testInvalidContainerReleaseRequest(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService) > Time elapsed: 41.336 sec <<< FAILURE! > java.lang.AssertionError: AppAttempt state is not correct (timedout) > expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at > org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:401) > at > org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService.testInvalidContainerReleaseRequest(TestApplicationMasterService.java:143) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045970#comment-14045970 ] Hadoop QA commented on YARN-1408: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652828/Yarn-1408.5.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4119//console This message is automatically generated. > Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task > timeout for 30mins > -- > > Key: YARN-1408 > URL: https://issues.apache.org/jira/browse/YARN-1408 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, > Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch > > > Capacity preemption is enabled as follows. > * yarn.resourcemanager.scheduler.monitor.enable= true , > * > yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy > Queue = a,b > Capacity of Queue A = 80% > Capacity of Queue B = 20% > Step 1: Assign a big jobA on queue a which uses full cluster capacity > Step 2: Submitted a jobB to queue b which would use less than 20% of cluster > capacity > JobA task which uses queue b capcity is been preempted and killed. > This caused below problem: > 1. New Container has got allocated for jobA in Queue A as per node update > from an NM. > 2. This container has been preempted immediately as per preemption. > Here ACQUIRED at KILLED Invalid State exception came when the next AM > heartbeat reached RM. > ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ACQUIRED at KILLED > This also caused the Task to go for a timeout for 30minutes as this Container > was already killed by preemption. > attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-1408: -- Attachment: Yarn-1408.5.patch Hi [~vinodkv] [~leftnoteasy] Please find initial patch. Some information about the patch. * While recovering ResourceRequest, if such an entry is found in Scheduling Info then the number of container is incremented. Else added as a new entry. * Adding a new OffRackRequest also while recovering, if the stored request is not OffRack. * AM would have asked for NodeLocal in another Hosts, which may not be able to recover. Kindly review. > Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task > timeout for 30mins > -- > > Key: YARN-1408 > URL: https://issues.apache.org/jira/browse/YARN-1408 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, > Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch > > > Capacity preemption is enabled as follows. > * yarn.resourcemanager.scheduler.monitor.enable= true , > * > yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy > Queue = a,b > Capacity of Queue A = 80% > Capacity of Queue B = 20% > Step 1: Assign a big jobA on queue a which uses full cluster capacity > Step 2: Submitted a jobB to queue b which would use less than 20% of cluster > capacity > JobA task which uses queue b capcity is been preempted and killed. > This caused below problem: > 1. New Container has got allocated for jobA in Queue A as per node update > from an NM. > 2. This container has been preempted immediately as per preemption. > Here ACQUIRED at KILLED Invalid State exception came when the next AM > heartbeat reached RM. > ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ACQUIRED at KILLED > This also caused the Task to go for a timeout for 30minutes as this Container > was already killed by preemption. > attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045959#comment-14045959 ] Jason Lowe commented on YARN-1341: -- Agree it's not ideal to discuss handling state store errors for all NM components in this JIRA. In general I'd prefer to discuss and address each case with the corresponding JIRA, e.g.: application state store errors discussed and addressed in YARN-1354, container state store errors in YARN-1337, etc. If we feel there's significant utility to committing a JIRA before all the issues are addressed then we can file one or more followup JIRAs to track those outstanding issues. That's the normal process we follow with other features/fixes as well. So if we follow that process then we're back to the discussion about RM master keys not being able to be stored in the state store. The choices we've discussed are: 1) Log an error, update the master key in memory, and continue 2) Log an error, _not_ update the master key in memory, and continue 3) Log an error and tear down the NM I'd prefer 1) since that is the option that preserves the most work in all scenarios I can think of, and I don't know of a scenario where 2) would handle it better. However I could be convinced given the right scenario. I'd really rather avoid 3) since that seems like a severe way to "handle" the error and guarantees work is lost. Oh there is one more handling scenario we briefly discussed where we flag the NM as "undesirable". When that occurs we don't shoot the containers that are running, but we avoid adding new containers since the node is having issues (i.e.: a drain-decommission). I feel that would be a separate JIRA since it needs YARN-914, and we'd still need to decide how to handle the error until the decommission is complete (i.e.: choice 1 or 2 above). > Recover NMTokens upon nodemanager restart > - > > Key: YARN-1341 > URL: https://issues.apache.org/jira/browse/YARN-1341 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, > YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-570) Time strings are formated in different timezone
[ https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045945#comment-14045945 ] Tsuyoshi OZAWA commented on YARN-570: - {quote} The format of JavaScript Date.toLocaleString() varies by the browser. {quote} One alternative to make format same is to change {{renderHadoopDate}} to return same format as {{yarn.util.Times#format()}} does instead of using {{Date#toLocaleString}}. [~ajisakaa], [~qwertymaniac], what do you think? > Time strings are formated in different timezone > --- > > Key: YARN-570 > URL: https://issues.apache.org/jira/browse/YARN-570 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.2.0 >Reporter: Peng Zhang >Assignee: Akira AJISAKA > Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch > > > Time strings on different page are displayed in different timezone. > If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as > "Wed, 10 Apr 2013 08:29:56 GMT" > If it is formatted by format() in yarn.util.Times, it appears as "10-Apr-2013 > 16:29:56" > Same value, but different timezone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045926#comment-14045926 ] Tsuyoshi OZAWA commented on YARN-1514: -- [~kkambatl], could you take a look at this JIRA? This per tools is useful and I hope to include this feature in 2.5.0 release. > Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA > > > Key: YARN-1514 > URL: https://issues.apache.org/jira/browse/YARN-1514 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Fix For: 2.5.0 > > Attachments: YARN-1514.1.patch, YARN-1514.2.patch, > YARN-1514.wip-2.patch, YARN-1514.wip.patch > > > ZKRMStateStore is very sensitive to ZNode-related operations as discussed in > YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is > called when RM-HA cluster does failover. Therefore, its execution time > impacts failover time of RM-HA. > We need utility to benchmark time execution time of ZKRMStateStore#loadStore > as development tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1872) DistributedShell occasionally keeps running endlessly
[ https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-1872: -- Summary: DistributedShell occasionally keeps running endlessly (was: TestDistributedShell occasionally fails in trunk) > DistributedShell occasionally keeps running endlessly > - > > Key: YARN-1872 > URL: https://issues.apache.org/jira/browse/YARN-1872 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ted Yu >Assignee: Hong Zhiguo > Attachments: TestDistributedShell.out, YARN-1872.patch > > > From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console : > TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and > TestDistributedShell#testDSShell timed out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext
[ https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045799#comment-14045799 ] Tsuyoshi OZAWA commented on YARN-2130: -- The test failure of TestRMApplicationHistoryWriter is not related and the issue is filed as YARN-2216. > Cleanup: Adding getRMAppManager, getQueueACLsManager, > getApplicationACLsManager to RMContext > > > Key: YARN-2130 > URL: https://issues.apache.org/jira/browse/YARN-2130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, > YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext
[ https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045795#comment-14045795 ] Hadoop QA commented on YARN-2130: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652792/YARN-2130.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 17 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4117//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4117//console This message is automatically generated. > Cleanup: Adding getRMAppManager, getQueueACLsManager, > getApplicationACLsManager to RMContext > > > Key: YARN-2130 > URL: https://issues.apache.org/jira/browse/YARN-2130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, > YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-570) Time strings are formated in different timezone
[ https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045772#comment-14045772 ] Tsuyoshi OZAWA commented on YARN-570: - [~qwertymaniac], Thank you for the review. If we'll make time format same completely, we need to change lots parts to use same format function. As a temporary fix that addresses this issue at first, Akira's patch looks good to me. What do you think? I think the timezone difference confuses users frequently, so we should fix it in the next release(2.5.0). > Time strings are formated in different timezone > --- > > Key: YARN-570 > URL: https://issues.apache.org/jira/browse/YARN-570 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.2.0 >Reporter: Peng Zhang >Assignee: Akira AJISAKA > Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch > > > Time strings on different page are displayed in different timezone. > If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as > "Wed, 10 Apr 2013 08:29:56 GMT" > If it is formatted by format() in yarn.util.Times, it appears as "10-Apr-2013 > 16:29:56" > Same value, but different timezone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext
[ https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2130: - Attachment: YARN-2130.6.patch Rebased on trunk. > Cleanup: Adding getRMAppManager, getQueueACLsManager, > getApplicationACLsManager to RMContext > > > Key: YARN-2130 > URL: https://issues.apache.org/jira/browse/YARN-2130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, > YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045745#comment-14045745 ] Hadoop QA commented on YARN-2142: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652788/trust001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The applied patch generated 1266 javac compiler warnings (more than the trunk's current 1258 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-auth. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4116//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4116//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4116//console This message is automatically generated. > Add one service to check the nodes' TRUST status > - > > Key: YARN-2142 > URL: https://issues.apache.org/jira/browse/YARN-2142 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager, scheduler >Affects Versions: 2.2.0 > Environment: OS:Ubuntu 13.04; > JAVA:OpenJDK 7u51-2.4.4-0 >Reporter: anders >Priority: Minor > Labels: patch > Fix For: 2.2.0 > > Attachments: test.patch, trust.patch, trust.patch, trust.patch, > trust001.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > Because of critical computing environment ,we must test every node's TRUST > status in the cluster (We can get the TRUST status by the API of OAT > sever),So I add this feature into hadoop's schedule . > By the TRUST check service ,node can get the TRUST status of itself, > then through the heartbeat ,send the TRUST status to resource manager for > scheduling. > In the scheduling step,if the node's TRUST status is 'false', it will be > abandoned until it's TRUST status turn to 'true'. > ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed
[ https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045742#comment-14045742 ] Hadoop QA commented on YARN-2104: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652783/YARN-2104.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4113//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4113//console This message is automatically generated. > Scheduler queue filter failed to work because index of queue column changed > --- > > Key: YARN-2104 > URL: https://issues.apache.org/jira/browse/YARN-2104 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2104.patch > > > YARN-563 added, > {code} > + th(".type", "Application Type”). > {code} > to application table, which makes queue’s column index from 3 to 4. And in > scheduler page, queue’s column index is hard coded to 3 when filter > application with queue’s name, > {code} > "if (q == 'root') q = '';", > "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';", > "$('#apps').dataTable().fnFilter(q, 3, true);", > {code} > So queue filter will not work for application page. > Reproduce steps: (Thanks Bo Yang for pointing this) > {code} > 1) In default setup, there’s a default queue under root queue > 2) Run an arbitrary application, you can find it in “Applications” page > 3) Click “Default” queue in scheduler page > 4) Click “Applications”, no application will show here > 5) Click “Root” queue in scheduler page > 6) Click “Applications”, application will show again > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-570) Time strings are formated in different timezone
[ https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045740#comment-14045740 ] Hadoop QA commented on YARN-570: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12644756/YARN-570.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4115//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4115//console This message is automatically generated. > Time strings are formated in different timezone > --- > > Key: YARN-570 > URL: https://issues.apache.org/jira/browse/YARN-570 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.2.0 >Reporter: Peng Zhang >Assignee: Akira AJISAKA > Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch > > > Time strings on different page are displayed in different timezone. > If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as > "Wed, 10 Apr 2013 08:29:56 GMT" > If it is formatted by format() in yarn.util.Times, it appears as "10-Apr-2013 > 16:29:56" > Same value, but different timezone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045726#comment-14045726 ] Varun Vasudev commented on YARN-2201: - Test failure is unrelated. > TestRMWebServicesAppsModification dependent on yarn-default.xml > --- > > Key: YARN-2201 > URL: https://issues.apache.org/jira/browse/YARN-2201 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ray Chiang >Assignee: Varun Vasudev > Labels: test > Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, > apache-yarn-2201.2.patch, apache-yarn-2201.3.patch > > > TestRMWebServicesAppsModification.java has some errors that are > yarn-default.xml dependent. By changing yarn-default.xml properties, I'm > seeing the following errors: > 1) Changing yarn.resourcemanager.scheduler.class from > capacity.CapacityScheduler to fair.FairScheduler gives the error: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 3.22 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > 2) Changing yarn.acl.enable from false to true results in the following > errors: > Running > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.986 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287) > testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.258 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369) > testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 2.263 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458) > testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification) > Time elapsed: 0.214 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidId(TestRMWebServicesAppsModification.java:482) > I'm opening this JIRA as a discussion for the best way to fix this. I've got > a few ideas, but I would like to get some feedback about potentially
[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Attachment: trust001.patch modify the xml > Add one service to check the nodes' TRUST status > - > > Key: YARN-2142 > URL: https://issues.apache.org/jira/browse/YARN-2142 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager, scheduler >Affects Versions: 2.2.0 > Environment: OS:Ubuntu 13.04; > JAVA:OpenJDK 7u51-2.4.4-0 >Reporter: anders >Priority: Minor > Labels: patch > Fix For: 2.2.0 > > Attachments: test.patch, trust.patch, trust.patch, trust.patch, > trust001.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > Because of critical computing environment ,we must test every node's TRUST > status in the cluster (We can get the TRUST status by the API of OAT > sever),So I add this feature into hadoop's schedule . > By the TRUST check service ,node can get the TRUST status of itself, > then through the heartbeat ,send the TRUST status to resource manager for > scheduling. > In the scheduling step,if the node's TRUST status is 'false', it will be > abandoned until it's TRUST status turn to 'true'. > ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Attachment: (was: trust.patch) > Add one service to check the nodes' TRUST status > - > > Key: YARN-2142 > URL: https://issues.apache.org/jira/browse/YARN-2142 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager, scheduler >Affects Versions: 2.2.0 > Environment: OS:Ubuntu 13.04; > JAVA:OpenJDK 7u51-2.4.4-0 >Reporter: anders >Priority: Minor > Labels: patch > Fix For: 2.2.0 > > Attachments: test.patch, trust.patch, trust.patch, trust.patch, > trust001.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > Because of critical computing environment ,we must test every node's TRUST > status in the cluster (We can get the TRUST status by the API of OAT > sever),So I add this feature into hadoop's schedule . > By the TRUST check service ,node can get the TRUST status of itself, > then through the heartbeat ,send the TRUST status to resource manager for > scheduling. > In the scheduling step,if the node's TRUST status is 'false', it will be > abandoned until it's TRUST status turn to 'true'. > ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045720#comment-14045720 ] Tsuyoshi OZAWA commented on YARN-2052: -- The test failure of TestRMApplicationHistoryWriter is filed as YARN-2216. This failure not related to this JIRA. [~jianhe] [~vinodkv], can you take a look, please? > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, > YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, > YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1810) YARN RM Webapp Application page Issue
[ https://issues.apache.org/jira/browse/YARN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045717#comment-14045717 ] Peng Zhang commented on YARN-1810: -- OK, I created JIRA: https://issues.apache.org/jira/browse/YARN-2221 > YARN RM Webapp Application page Issue > - > > Key: YARN-1810 > URL: https://issues.apache.org/jira/browse/YARN-1810 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.3.0 >Reporter: Ethan Setnik > Attachments: Screen Shot 2014-03-10 at 3.59.54 PM.png, Screen Shot > 2014-03-11 at 1.40.12 PM.png > > > When browsing the ResourceManager's web interface I am presented with the > attached screenshot. > I can't understand why it does not show the applications, even though there > is no search text. The application counts show the correct values relative > to the submissions, successes, and failures. > Also see the text in the screenshot: > "Showing 0 to 0 of 0 entries (filtered from 19 total entries)" -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2221) WebUI: RM scheduler page's queue filter status will affect appllication page
Peng Zhang created YARN-2221: Summary: WebUI: RM scheduler page's queue filter status will affect appllication page Key: YARN-2221 URL: https://issues.apache.org/jira/browse/YARN-2221 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Peng Zhang Priority: Minor Apps queue filter added by clicking queue bar in scheduler page will affect display of applications page. No filter query is shown on applications page, this makes confusions. Also we cannot reset the filter query on application page, and we must come back to scheduler page, click "root" queue to reset. Reproduce steps: {code} 1) Configure two queues under root( A & B) 2) Run some apps using queue A and B respectively 3) Click “A” queue in scheduler page 4) Click “Applications”, only apps of queue A show 5) Click “B” queue in scheduler page 6) Click “Applications”, only apps of queue B show {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Attachment: trust.patch > Add one service to check the nodes' TRUST status > - > > Key: YARN-2142 > URL: https://issues.apache.org/jira/browse/YARN-2142 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager, scheduler >Affects Versions: 2.2.0 > Environment: OS:Ubuntu 13.04; > JAVA:OpenJDK 7u51-2.4.4-0 >Reporter: anders >Priority: Minor > Labels: patch > Fix For: 2.2.0 > > Attachments: test.patch, trust.patch, trust.patch, trust.patch, > trust.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > Because of critical computing environment ,we must test every node's TRUST > status in the cluster (We can get the TRUST status by the API of OAT > sever),So I add this feature into hadoop's schedule . > By the TRUST check service ,node can get the TRUST status of itself, > then through the heartbeat ,send the TRUST status to resource manager for > scheduling. > In the scheduling step,if the node's TRUST status is 'false', it will be > abandoned until it's TRUST status turn to 'true'. > ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045713#comment-14045713 ] Hadoop QA commented on YARN-2142: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652787/trust.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4114//console This message is automatically generated. > Add one service to check the nodes' TRUST status > - > > Key: YARN-2142 > URL: https://issues.apache.org/jira/browse/YARN-2142 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager, scheduler >Affects Versions: 2.2.0 > Environment: OS:Ubuntu 13.04; > JAVA:OpenJDK 7u51-2.4.4-0 >Reporter: anders >Priority: Minor > Labels: patch > Fix For: 2.2.0 > > Attachments: test.patch, trust.patch, trust.patch, trust.patch, > trust.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > Because of critical computing environment ,we must test every node's TRUST > status in the cluster (We can get the TRUST status by the API of OAT > sever),So I add this feature into hadoop's schedule . > By the TRUST check service ,node can get the TRUST status of itself, > then through the heartbeat ,send the TRUST status to resource manager for > scheduling. > In the scheduling step,if the node's TRUST status is 'false', it will be > abandoned until it's TRUST status turn to 'true'. > ***The logic of this feature is similar to node's health checkservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed
[ https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045708#comment-14045708 ] Peng Zhang commented on YARN-2104: -- Looks good to me. > Scheduler queue filter failed to work because index of queue column changed > --- > > Key: YARN-2104 > URL: https://issues.apache.org/jira/browse/YARN-2104 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2104.patch > > > YARN-563 added, > {code} > + th(".type", "Application Type”). > {code} > to application table, which makes queue’s column index from 3 to 4. And in > scheduler page, queue’s column index is hard coded to 3 when filter > application with queue’s name, > {code} > "if (q == 'root') q = '';", > "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';", > "$('#apps').dataTable().fnFilter(q, 3, true);", > {code} > So queue filter will not work for application page. > Reproduce steps: (Thanks Bo Yang for pointing this) > {code} > 1) In default setup, there’s a default queue under root queue > 2) Run an arbitrary application, you can find it in “Applications” page > 3) Click “Default” queue in scheduler page > 4) Click “Applications”, no application will show here > 5) Click “Root” queue in scheduler page > 6) Click “Applications”, application will show again > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1810) YARN RM Webapp Application page Issue
[ https://issues.apache.org/jira/browse/YARN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045697#comment-14045697 ] Wangda Tan commented on YARN-1810: -- I've uploaded a simple fix to YARN-2104, please kindly review! [~peng.zhang], good suggestion, could you create a JIRA to track it? > YARN RM Webapp Application page Issue > - > > Key: YARN-1810 > URL: https://issues.apache.org/jira/browse/YARN-1810 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.3.0 >Reporter: Ethan Setnik > Attachments: Screen Shot 2014-03-10 at 3.59.54 PM.png, Screen Shot > 2014-03-11 at 1.40.12 PM.png > > > When browsing the ResourceManager's web interface I am presented with the > attached screenshot. > I can't understand why it does not show the applications, even though there > is no search text. The application counts show the correct values relative > to the submissions, successes, and failures. > Also see the text in the screenshot: > "Showing 0 to 0 of 0 entries (filtered from 19 total entries)" -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed
[ https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2104: - Attachment: YARN-2104.patch Attached a simple fix for this > Scheduler queue filter failed to work because index of queue column changed > --- > > Key: YARN-2104 > URL: https://issues.apache.org/jira/browse/YARN-2104 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2104.patch > > > YARN-563 added, > {code} > + th(".type", "Application Type”). > {code} > to application table, which makes queue’s column index from 3 to 4. And in > scheduler page, queue’s column index is hard coded to 3 when filter > application with queue’s name, > {code} > "if (q == 'root') q = '';", > "else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';", > "$('#apps').dataTable().fnFilter(q, 3, true);", > {code} > So queue filter will not work for application page. > Reproduce steps: (Thanks Bo Yang for pointing this) > {code} > 1) In default setup, there’s a default queue under root queue > 2) Run an arbitrary application, you can find it in “Applications” page > 3) Click “Default” queue in scheduler page > 4) Click “Applications”, no application will show here > 5) Click “Root” queue in scheduler page > 6) Click “Applications”, application will show again > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045681#comment-14045681 ] Hadoop QA commented on YARN-2052: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652767/YARN-2052.9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4112//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4112//console This message is automatically generated. > ContainerId creation after work preserving restart is broken > > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, > YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, > YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch > > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1810) YARN RM Webapp Application page Issue
[ https://issues.apache.org/jira/browse/YARN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045679#comment-14045679 ] Peng Zhang commented on YARN-1810: -- I updated $('#apps').dataTable().fnFilter(q, 3, true);" field number from 3 to 4, click “default" queue bar, applications will not disappear. But I found this fnFilter query will be maintained to "Application" page. As we have multiple queues, If I click one of them in scheduler page, and go to application page, only applications of clicked queue will show, other applications are filtered. Cause no filter query shows on page, so this may cause confusions. > YARN RM Webapp Application page Issue > - > > Key: YARN-1810 > URL: https://issues.apache.org/jira/browse/YARN-1810 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.3.0 >Reporter: Ethan Setnik > Attachments: Screen Shot 2014-03-10 at 3.59.54 PM.png, Screen Shot > 2014-03-11 at 1.40.12 PM.png > > > When browsing the ResourceManager's web interface I am presented with the > attached screenshot. > I can't understand why it does not show the applications, even though there > is no search text. The application counts show the correct values relative > to the submissions, successes, and failures. > Also see the text in the screenshot: > "Showing 0 to 0 of 0 entries (filtered from 19 total entries)" -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2163) WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo().
[ https://issues.apache.org/jira/browse/YARN-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045668#comment-14045668 ] Wangda Tan commented on YARN-2163: -- Thanks [~raviprak] for review and commit! > WebUI: Order of AppId in apps table should be consistent with > ApplicationId.compareTo(). > > > Key: YARN-2163 > URL: https://issues.apache.org/jira/browse/YARN-2163 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Minor > Fix For: 3.0.0, 2.5.0 > > Attachments: YARN-2163.patch, apps page.png > > > Currently, AppId is treated as numeric, so the sort result in applications > table is sorted by int typed id only (not included cluster timestamp), see > attached screenshot. Order of AppId in web page should be consistent with > ApplicationId.compareTo(). -- This message was sent by Atlassian JIRA (v6.2#6252)