[jira] [Commented] (YARN-1883) TetsRMAdminService fails due to inconsistent entries in UserGroups
[ https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948581#comment-13948581 ] Mit Desai commented on YARN-1883: - Working on the fix. Will be posting a patch soon TetsRMAdminService fails due to inconsistent entries in UserGroups -- Key: YARN-1883 URL: https://issues.apache.org/jira/browse/YARN-1883 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails with the following error: {noformat} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421) at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104) {noformat} Line Numbers will be inconsistent as I was testing to run it in a particular order. But the Line on which the failure occurs is {code} Assert.assertTrue(groupBefore.contains(test_group_A) groupBefore.contains(test_group_B) groupBefore.contains(test_group_C) groupBefore.size() == 3); {code} testRMInitialsWithFileSystemBasedConfigurationProvider() and testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes the list of userGroups. testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() tries to verify the groups before changing it and fails if testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made the changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1883) TetsRMAdminService fails due to inconsistent entries in UserGroups
[ https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1883: Attachment: YARN-1883.patch Attaching the patch for trunk and branch-2 TetsRMAdminService fails due to inconsistent entries in UserGroups -- Key: YARN-1883 URL: https://issues.apache.org/jira/browse/YARN-1883 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 Attachments: YARN-1883.patch testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails with the following error: {noformat} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421) at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104) {noformat} Line Numbers will be inconsistent as I was testing to run it in a particular order. But the Line on which the failure occurs is {code} Assert.assertTrue(groupBefore.contains(test_group_A) groupBefore.contains(test_group_B) groupBefore.contains(test_group_C) groupBefore.size() == 3); {code} testRMInitialsWithFileSystemBasedConfigurationProvider() and testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes the list of userGroups. testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() tries to verify the groups before changing it and fails if testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made the changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946939#comment-13946939 ] Mit Desai commented on YARN-1873: - I see it is a Timeout issue in YARN-1872. The error reported here is an assertion failure. I think this is a different issue TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1873: Affects Version/s: 2.4.0 TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946971#comment-13946971 ] Mit Desai commented on YARN-1873: - Yes it fails on branch-2.4. I updated the JIRA to reflect that. I am using JDK7. Have not tried with JDK6. But this is clearly a cleanup issue so I assumed it is JDK7 issue. If I ran the test testDSShell independently, it never fails. TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1873: Attachment: YARN-1873.patch Attaching the patch TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Zhijie Shen Attachments: YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947066#comment-13947066 ] Mit Desai commented on YARN-1873: - [~zjshen], Didn't see your comment. Attached the patch already TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Zhijie Shen Attachments: YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947114#comment-13947114 ] Mit Desai commented on YARN-1854: - [~rohithsharma] : The logs that I have submitted already has the 5secs timeout change. I am creating another jira for the issue. Can you please update the description of this jira so that it describes what it alctually fixes? TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Fix For: 2.4.0 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947119#comment-13947119 ] Mit Desai commented on YARN-1854: - Created YARN-1875 to track the issue TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Fix For: 2.4.0 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1875) TestRMHA#testStartAndTransitions is failing
[ https://issues.apache.org/jira/browse/YARN-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1875: Attachment: Log.rtf Attaching the logs for the failure TestRMHA#testStartAndTransitions is failing --- Key: YARN-1875 URL: https://issues.apache.org/jira/browse/YARN-1875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Mit Desai Attachments: Log.rtf {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1875) TestRMHA#testStartAndTransitions is failing
Mit Desai created YARN-1875: --- Summary: TestRMHA#testStartAndTransitions is failing Key: YARN-1875 URL: https://issues.apache.org/jira/browse/YARN-1875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Mit Desai Attachments: Log.rtf {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947176#comment-13947176 ] Mit Desai commented on YARN-1873: - I'll change them and upload a new patch. Thanks for the review! TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1873: Attachment: YARN-1873.patch Attached the new patch TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: YARN-1873.patch, YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947244#comment-13947244 ] Mit Desai commented on YARN-1873: - I looked at it. Can you let me know if we need to make it {{protected final String APPMASTER_JAR = JarFinder.getJar(ApplicationMaster.class);}} or {{protected final static String APPMASTER_JAR = JarFinder.getJar(ApplicationMaster.class);}} I think making it final instead of final static will be enough. What do you say? TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: YARN-1873.patch, YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945188#comment-13945188 ] Mit Desai commented on YARN-1670: - I realize that I created the patch based on the trunk before the commit of the earlier patch so it fails. I will upload a new one. [~jeagles] # Nice logic. This is much easier to understand. I will incorporate your suggestion in the new change. # For the buffer size, you are correct. I already did some analysis on that. I read some discussions/articles online which say that 64K buffer size performs efficiently. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670-v4-b23.patch aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670-v4.patch Attaching the patch with the updated changes. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1854: Attachment: Log.rtf [~rohithsharma], [~vinodkv], I have added the logs for the failure that I mentioned before. I found this failure in our nightly builds TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Fix For: 2.4.0 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1837) TestMoveApplication.testMoveRejectedByScheduler randomly fails
[ https://issues.apache.org/jira/browse/YARN-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945730#comment-13945730 ] Mit Desai commented on YARN-1837: - This test is also failing for us. TestMoveApplication.testMoveRejectedByScheduler randomly fails -- Key: YARN-1837 URL: https://issues.apache.org/jira/browse/YARN-1837 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.3.0 Reporter: Tsuyoshi OZAWA TestMoveApplication#testMoveRejectedByScheduler fails because of NullPointerException. It looks caused by unhandled exception handling at server-side. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670-v4-b23.patch aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670-v4-b23.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670-v4.patch [~tgraves], [~jeagles] and [~vinodkv], I am adding new patch. I have included the check in the while loop to make sure that we do not write the whole buffer if the last iteration has the file contents less than buffer size. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670-v3-b23.patch aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670-v3.patch Thanks [~vinodkv] for the feedback. 1- I changed the formatting. 2- I have modified the patch to use up less memory. It should work now. I have also tested the new patch on my Eclipse IDE with HeapSize=1GB and the test pass every time I run it. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943497#comment-13943497 ] Mit Desai commented on YARN-1670: - Thats correct Vinod. In the last iteration, where the buf length is greater than the remaining portion of the file, we will have to write the {{fileLength-curRead}} bytes aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 2.4.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941744#comment-13941744 ] Mit Desai commented on YARN-1854: - Thanks [~rohithsharma] for taking this JIRA. For the first failure, I lost the logs and I am not able to reproduce that again. I am trying to reproduce that and provide you the logs once I can make it fail. TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Attachments: YARN-1854.patch {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941787#comment-13941787 ] Mit Desai commented on YARN-1854: - I assume this is the same failure as YARN-1786 TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Attachments: YARN-1854.patch {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670-v2-b23.patch Adding patch for trunk, branch-2 and branch23 (This patch has previous change + Unit test verifying the change ) aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670-v2.patch aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1854) TestRMHA#testStartAndTransitions Fails
Mit Desai created YARN-1854: --- Summary: TestRMHA#testStartAndTransitions Fails Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940707#comment-13940707 ] Mit Desai commented on YARN-1854: - It I got that failing in our nightly builds. When I tested it on my local machine, I got the same error. But now when I tried testing it again, I get the following error intermittently. {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 1.755 sec FAILURE! java.lang.AssertionError: Incorrect value for metric appsPending expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:384) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:154) Results : Failed tests: TestRMHA.testStartAndTransitions:154-verifyClusterMetrics:384-assertMetric:396 Incorrect value for metric appsPending expected:1 but was:0 {noformat} TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Priority: Blocker {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1833) TestRMAdminService Fails in branch-2
Mit Desai created YARN-1833: --- Summary: TestRMAdminService Fails in branch-2 Key: YARN-1833 URL: https://issues.apache.org/jira/browse/YARN-1833 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Mit Desai In the test testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the following assert is not needed. {code} Assert.assertTrue(groupWithInit.size() != groupBefore.size()); {code} As the assert takes the default groups for groupWithInit (which in my case are users, sshusers and wheel), it fails as the size of both groupWithInit and groupBefore are same. I do not think we need to have this assert here. Moreover we are also checking that the groupInit does not have the userGroups that are in the groupBefore so removing the assert may not be harmful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1833) TestRMAdminService Fails in branch-2
[ https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935331#comment-13935331 ] Mit Desai commented on YARN-1833: - I am in the process of generating the patch. I will be uploading it soon. TestRMAdminService Fails in branch-2 Key: YARN-1833 URL: https://issues.apache.org/jira/browse/YARN-1833 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Mit Desai In the test testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the following assert is not needed. {code} Assert.assertTrue(groupWithInit.size() != groupBefore.size()); {code} As the assert takes the default groups for groupWithInit (which in my case are users, sshusers and wheel), it fails as the size of both groupWithInit and groupBefore are same. I do not think we need to have this assert here. Moreover we are also checking that the groupInit does not have the userGroups that are in the groupBefore so removing the assert may not be harmful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1833) TestRMAdminService Fails in branch-2
[ https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1833: Attachment: YARN-1833.patch Attaching the patch for trunk and branch-2 TestRMAdminService Fails in branch-2 Key: YARN-1833 URL: https://issues.apache.org/jira/browse/YARN-1833 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: Test Attachments: YARN-1833.patch In the test testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the following assert is not needed. {code} Assert.assertTrue(groupWithInit.size() != groupBefore.size()); {code} As the assert takes the default groups for groupWithInit (which in my case are users, sshusers and wheel), it fails as the size of both groupWithInit and groupBefore are same. I do not think we need to have this assert here. Moreover we are also checking that the groupInit does not have the userGroups that are in the groupBefore so removing the assert may not be harmful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935502#comment-13935502 ] Mit Desai commented on YARN-1591: - Hey, I have done a little investigation on the test. {code} static { DefaultMetricsSystem.setMiniClusterMode(true); } {code} Setting this property ignores the Metrics source already in the unit test. This change seems to be working on my local machine. What do you guys think of it? TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch, YARN-1591.5.patch, YARN-1591.6.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935505#comment-13935505 ] Mit Desai commented on YARN-1833: - Thanks Akira. Even I verified that it is not related to my patch. TestRMAdminService Fails in trunk and branch-2 -- Key: YARN-1833 URL: https://issues.apache.org/jira/browse/YARN-1833 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: Test Attachments: YARN-1833.patch In the test testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the following assert is not needed. {code} Assert.assertTrue(groupWithInit.size() != groupBefore.size()); {code} As the assert takes the default groups for groupWithInit (which in my case are users, sshusers and wheel), it fails as the size of both groupWithInit and groupBefore are same. I do not think we need to have this assert here. Moreover we are also checking that the groupInit does not have the userGroups that are in the groupBefore so removing the assert may not be harmful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1833) TestRMAdminService Fails in trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1833: Attachment: YARN-1833-v2.patch Thanks [~jeagles] for the suggestion. I did not think about that solution. Attaching the new patch with the dummyUser for the test and no Assert Removed. TestRMAdminService Fails in trunk and branch-2 -- Key: YARN-1833 URL: https://issues.apache.org/jira/browse/YARN-1833 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: Test Attachments: YARN-1833-v2.patch, YARN-1833.patch In the test testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the following assert is not needed. {code} Assert.assertTrue(groupWithInit.size() != groupBefore.size()); {code} As the assert takes the default groups for groupWithInit (which in my case are users, sshusers and wheel), it fails as the size of both groupWithInit and groupBefore are same. I do not think we need to have this assert here. Moreover we are also checking that the groupInit does not have the userGroups that are in the groupBefore so removing the assert may not be harmful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670-b23.patch aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670.patch Attaching the patch for trunk, branch2 and branch23. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670.patch updated patch for trunk and branch-2 aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920137#comment-13920137 ] Mit Desai commented on YARN-1670: - It is a code change and there are no unit tests for this change aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned YARN-1670: --- Assignee: Mit Desai aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905799#comment-13905799 ] Mit Desai commented on YARN-1281: - Is this failure just related to the test or is there some bug in hadoop? TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906020#comment-13906020 ] Mit Desai commented on YARN-1281: - I had tried it on my machine and it was passing too. Just wanted to make sure it is a test issue and not a real bug TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved YARN-1281. - Resolution: Cannot Reproduce Target Version/s: (was: ) This JIRA has been open for a long time and the issue does not seem to be reproducible. I am closing it for now. We can open it again if we find out that it is failing again. TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Reopened] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reopened YARN-1281: - I see Karthik. Reopening it TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1628) TestContainerManagerSecurity fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892754#comment-13892754 ] Mit Desai commented on YARN-1628: - I have some workaround. Will get back with what I get on it. TestContainerManagerSecurity fails on trunk --- Key: YARN-1628 URL: https://issues.apache.org/jira/browse/YARN-1628 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-1628.1.patch, YARN-1628.patch The Test fails with the following error {noformat} java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1628) TestContainerManagerSecurity fails on trunk
Mit Desai created YARN-1628: --- Summary: TestContainerManagerSecurity fails on trunk Key: YARN-1628 URL: https://issues.apache.org/jira/browse/YARN-1628 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0, 3.0.0 Reporter: Mit Desai Assignee: Mit Desai The Test fails with the following error {noformat} java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1628) TestContainerManagerSecurity fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1628: Attachment: YARN-1628.patch Attaching the patch. The argument InvalidHost was not a valid argument which resulted into throwing UnknownHostException. TestContainerManagerSecurity fails on trunk --- Key: YARN-1628 URL: https://issues.apache.org/jira/browse/YARN-1628 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-1628.patch The Test fails with the following error {noformat} java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838260#comment-13838260 ] Mit Desai commented on YARN-1463: - Just for information, This also causes the TestJHSSecurity#testDeligationToken to fail TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838262#comment-13838262 ] Mit Desai commented on YARN-1463: - Correction: TestJHSSecurity#testDelegationToken TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.000.patch, YARN-1463.v1.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1270) TestSLSRunner test is failing
[ https://issues.apache.org/jira/browse/YARN-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1270: Description: Added in the YARN-1021 patch, the test TestSLSRunner is now failing. (was: Added in the YARn-1021 patch, the test TestSLSRunner is now failing.) TestSLSRunner test is failing - Key: YARN-1270 URL: https://issues.apache.org/jira/browse/YARN-1270 Project: Hadoop YARN Issue Type: Bug Reporter: Mit Desai Added in the YARN-1021 patch, the test TestSLSRunner is now failing. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1270) TestSLSRunner test is failing
[ https://issues.apache.org/jira/browse/YARN-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1270: Summary: TestSLSRunner test is failing (was: TestSLSRunner is failing) TestSLSRunner test is failing - Key: YARN-1270 URL: https://issues.apache.org/jira/browse/YARN-1270 Project: Hadoop YARN Issue Type: Bug Reporter: Mit Desai Added in the YARn-1021 patch, the test TestSLSRunner is now failing. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785547#comment-13785547 ] Mit Desai commented on YARN-1021: - Hey Wei, FYI, I would like to inform you that the test TestSLSRunner is failing. I have created a new JIRA for that YARN-1270 Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1199) Make NM/RM Versions Available
[ https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1199: Attachment: YARN-1199.patch Thanks Rob for pointing this out. I have made the changes to the patch and attached it. Can you please review it. Make NM/RM Versions Available - Key: YARN-1199 URL: https://issues.apache.org/jira/browse/YARN-1199 Project: Hadoop YARN Issue Type: Improvement Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch Now as we have the NM and RM Versions available, we can display the YARN version of nodes running in the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1119) Add ClusterMetrics checks to tho TestRMNodeTransitions tests
[ https://issues.apache.org/jira/browse/YARN-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1119: Attachment: YARN-1119.patch Patch posted for trunk Add ClusterMetrics checks to tho TestRMNodeTransitions tests Key: YARN-1119 URL: https://issues.apache.org/jira/browse/YARN-1119 Project: Hadoop YARN Issue Type: Test Components: resourcemanager Affects Versions: 3.0.0, 0.23.9, 2.0.6-alpha Reporter: Robert Parker Assignee: Mit Desai Attachments: YARN-1119.patch, YARN-1119-v1-b23.patch YARN-1101 identified an issue where UNHEALTHY nodes could double decrement the active nodes. We should add checks for RUNNING node transitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1119) Add ClusterMetrics checks to tho TestRMNodeTransitions tests
[ https://issues.apache.org/jira/browse/YARN-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1119: Assignee: Mit Desai Add ClusterMetrics checks to tho TestRMNodeTransitions tests Key: YARN-1119 URL: https://issues.apache.org/jira/browse/YARN-1119 Project: Hadoop YARN Issue Type: Test Components: resourcemanager Affects Versions: 3.0.0, 0.23.9, 2.0.6-alpha Reporter: Robert Parker Assignee: Mit Desai Attachments: YARN-1119-v1-b23.patch YARN-1101 identified an issue where UNHEALTHY nodes could double decrement the active nodes. We should add checks for RUNNING node transitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1119) Add ClusterMetrics checks to tho TestRMNodeTransitions tests
[ https://issues.apache.org/jira/browse/YARN-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1119: Attachment: YARN-1119-v1-b23.patch Patch posted for branch 0.23. Add ClusterMetrics checks to tho TestRMNodeTransitions tests Key: YARN-1119 URL: https://issues.apache.org/jira/browse/YARN-1119 Project: Hadoop YARN Issue Type: Test Components: resourcemanager Affects Versions: 3.0.0, 0.23.9, 2.0.6-alpha Reporter: Robert Parker Attachments: YARN-1119-v1-b23.patch YARN-1101 identified an issue where UNHEALTHY nodes could double decrement the active nodes. We should add checks for RUNNING node transitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-897: --- Attachment: YARN-897-08152013-br-0.23.patch Posted patch for branch-0.23 CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.4-alpha Reporter: Djellel Eddine Difallah Assignee: Djellel Eddine Difallah Priority: Blocker Fix For: 2.1.0-beta Attachments: TestBugParentQueue.java, YARN-897-08152013-br-0.23.patch, YARN-897-1.patch, YARN-897-2.patch, YARN-897-3.patch, YARN-897-4.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira