from:"Mit Desai $JIRA$"

[jira] [Commented] (YARN-1883) TetsRMAdminService fails due to inconsistent entries in UserGroups

2014-03-26 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948581#comment-13948581
 ] 

Mit Desai commented on YARN-1883:
-

Working on the fix. Will be posting a patch soon

 TetsRMAdminService fails due to inconsistent entries in UserGroups
 --

 Key: YARN-1883
 URL: https://issues.apache.org/jira/browse/YARN-1883
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7

 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails 
 with the following error:
 {noformat}
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104)
 {noformat}
 Line Numbers will be inconsistent as I was testing to run it in a particular 
 order. But the Line on which the failure occurs is
 {code}
 Assert.assertTrue(groupBefore.contains(test_group_A)
  groupBefore.contains(test_group_B)
  groupBefore.contains(test_group_C)  groupBefore.size() == 3);
 {code}
 testRMInitialsWithFileSystemBasedConfigurationProvider() and
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider()
 calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes 
 the list of userGroups.
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() 
 tries to verify the groups before changing it and fails if 
 testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made 
 the changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1883) TetsRMAdminService fails due to inconsistent entries in UserGroups

2014-03-26 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1883:


Attachment: YARN-1883.patch

Attaching the patch for trunk and branch-2

 TetsRMAdminService fails due to inconsistent entries in UserGroups
 --

 Key: YARN-1883
 URL: https://issues.apache.org/jira/browse/YARN-1883
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7
 Attachments: YARN-1883.patch


 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails 
 with the following error:
 {noformat}
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104)
 {noformat}
 Line Numbers will be inconsistent as I was testing to run it in a particular 
 order. But the Line on which the failure occurs is
 {code}
 Assert.assertTrue(groupBefore.contains(test_group_A)
  groupBefore.contains(test_group_B)
  groupBefore.contains(test_group_C)  groupBefore.size() == 3);
 {code}
 testRMInitialsWithFileSystemBasedConfigurationProvider() and
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider()
 calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes 
 the list of userGroups.
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() 
 tries to verify the groups before changing it and fails if 
 testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made 
 the changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails

2014-03-25 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946939#comment-13946939
 ] 

Mit Desai commented on YARN-1873:
-

I see it is a Timeout issue in YARN-1872. The error reported here is an 
assertion failure. I think this is a different issue

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7

 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1873) TestDistributedShell#testDSShell fails

2014-03-25 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1873:


Affects Version/s: 2.4.0

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7

 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails

2014-03-25 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946971#comment-13946971
 ] 

Mit Desai commented on YARN-1873:
-

Yes it fails on branch-2.4. I updated the JIRA to reflect that.
I am using JDK7. Have not tried with JDK6. But this is clearly a cleanup issue 
so I assumed it is JDK7 issue. If I ran the test testDSShell independently, it 
never fails.

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7

 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1873) TestDistributedShell#testDSShell fails

2014-03-25 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1873:


Attachment: YARN-1873.patch

Attaching the patch

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Zhijie Shen
 Attachments: YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails

2014-03-25 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947066#comment-13947066
 ] 

Mit Desai commented on YARN-1873:
-

[~zjshen], Didn't see your comment. Attached the patch already

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Zhijie Shen
 Attachments: YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-25 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947114#comment-13947114
 ] 

Mit Desai commented on YARN-1854:
-

[~rohithsharma] : The logs that I have submitted already has the 5secs timeout 
change.
I am creating another jira for the issue. Can you please update the description 
of this jira so that it describes what it alctually fixes?

 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Fix For: 2.4.0

 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-25 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947119#comment-13947119
 ] 

Mit Desai commented on YARN-1854:
-

Created YARN-1875 to track the issue

 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Fix For: 2.4.0

 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1875) TestRMHA#testStartAndTransitions is failing

2014-03-25 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1875:


Attachment: Log.rtf

Attaching the logs for the failure

 TestRMHA#testStartAndTransitions is failing
 ---

 Key: YARN-1875
 URL: https://issues.apache.org/jira/browse/YARN-1875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mit Desai
 Attachments: Log.rtf


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1875) TestRMHA#testStartAndTransitions is failing

2014-03-25 Thread Mit Desai (JIRA)

Mit Desai created YARN-1875:
---

 Summary: TestRMHA#testStartAndTransitions is failing
 Key: YARN-1875
 URL: https://issues.apache.org/jira/browse/YARN-1875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mit Desai
 Attachments: Log.rtf

{noformat}
testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) 
 Time elapsed: 5.883 sec   FAILURE!
java.lang.AssertionError: Incorrect value for metric availableMB 
expected:2048 but was:4096
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)


Results :

Failed tests: 
  
TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
 Incorrect value for metric availableMB expected:2048 but was:4096
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails

2014-03-25 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947176#comment-13947176
 ] 

Mit Desai commented on YARN-1873:
-

I'll change them and upload a new patch. Thanks for the review!

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1873) TestDistributedShell#testDSShell fails

2014-03-25 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1873:


Attachment: YARN-1873.patch

Attached the new patch

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1873.patch, YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails

2014-03-25 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947244#comment-13947244
 ] 

Mit Desai commented on YARN-1873:
-

I looked at it. Can you let me know if we need to make it
{{protected final String APPMASTER_JAR = 
JarFinder.getJar(ApplicationMaster.class);}}
or
{{protected final static String APPMASTER_JAR = 
JarFinder.getJar(ApplicationMaster.class);}}

I think making it final instead of final static will be enough. What do you say?

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1873.patch, YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-24 Thread Mit Desai (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945188#comment-13945188
]

Mit Desai commented on YARN-1670:
-

I realize that I created the patch based on the trunk before the commit of the
earlier patch so it fails. I will upload a new one.
[~jeagles]
# Nice logic. This is much easier to understand. I will incorporate your
suggestion in the new change.
# For the buffer size, you are correct. I already did some analysis on that. I
read some discussions/articles online which say that 64K buffer size performs
efficiently.

aggregated log writer can write more log data then it says is the log length

Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch,
YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch,
YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch

We have seen exceptions when using 'yarn logs' to read log files.
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at
org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at
org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at
org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
We traced it down to the reader trying to read the file type of the next file
but where it reads is still log data from the previous file. What happened
was the Log Length was written as a certain size but the log data was
actually longer then that.
Inside of the write() routine in LogValue it first writes what the logfile
length is, but then when it goes to write the log itself it just goes to the
end of the file. There is a race condition here where if someone is still
writing to the file when it goes to be aggregated the length written could be
to small.
We should have the write() routine stop when it writes whatever it said was
the length. It would be nice if we could somehow tell the user it might be
truncated but I'm not sure of a good way to do this.
We also noticed that a bug in readAContainerLogsForALogType where it is using
an int for curRead whereas it should be using a long.
while (len != -1 curRead fileLength) {
This isn't actually a problem right now as it looks like the underlying
decoder is doing the right thing and the len condition exits.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-24 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670-v4-b23.patch

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-24 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670-v4.patch

Attaching the patch with the updated changes.

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, 
 YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-24 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1854:


Attachment: Log.rtf

[~rohithsharma], [~vinodkv], I have added the logs for the failure that I 
mentioned before. I found this failure in our nightly builds

 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Fix For: 2.4.0

 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1837) TestMoveApplication.testMoveRejectedByScheduler randomly fails

2014-03-24 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945730#comment-13945730
 ] 

Mit Desai commented on YARN-1837:
-

This test is also failing for us.

 TestMoveApplication.testMoveRejectedByScheduler randomly fails
 --

 Key: YARN-1837
 URL: https://issues.apache.org/jira/browse/YARN-1837
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Tsuyoshi OZAWA

 TestMoveApplication#testMoveRejectedByScheduler fails because of 
 NullPointerException. It looks caused by unhandled exception handling at 
 server-side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-23 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670-v4-b23.patch

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670-v4-b23.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-23 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670-v4.patch

[~tgraves], [~jeagles] and [~vinodkv], I am adding new patch. I have included 
the check in the while loop to make sure that we do not write the whole buffer 
if the last iteration has the file contents less than buffer size.

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670-v3-b23.patch

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Mit Desai (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mit Desai updated YARN-1670:

Attachment: YARN-1670-v3.patch

Thanks [~vinodkv] for the feedback.
1- I changed the formatting.
2- I have modified the patch to use up less memory. It should work now. I have
also tested the new patch on my Eclipse IDE with HeapSize=1GB and the test pass
every time I run it.

aggregated log writer can write more log data then it says is the log length

Key: YARN-1670
URL: https://issues.apache.org/jira/browse/YARN-1670
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch,
YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch,
YARN-1670.patch, YARN-1670.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943497#comment-13943497
 ] 

Mit Desai commented on YARN-1670:
-

Thats correct Vinod. In the last iteration, where the buf length is greater 
than the remaining portion of the file, we will have to write the 
{{fileLength-curRead}} bytes

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-20 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941744#comment-13941744
 ] 

Mit Desai commented on YARN-1854:
-

Thanks [~rohithsharma] for taking this JIRA. For the first failure, I lost the 
logs and I am not able to reproduce that again. I am trying to reproduce that 
and provide you the logs once I can make it fail.

 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Attachments: YARN-1854.patch


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-20 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941787#comment-13941787
 ] 

Mit Desai commented on YARN-1854:
-

I assume this is the same failure as YARN-1786

 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Attachments: YARN-1854.patch


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-20 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670-v2-b23.patch

Adding patch for trunk, branch-2 and branch23 (This patch has previous change + 
Unit test verifying the change )

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-20 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670-v2.patch

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-19 Thread Mit Desai (JIRA)

Mit Desai created YARN-1854:
---

 Summary: TestRMHA#testStartAndTransitions Fails
 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai


{noformat}
testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) 
 Time elapsed: 5.883 sec   FAILURE!
java.lang.AssertionError: Incorrect value for metric availableMB 
expected:2048 but was:4096
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)


Results :

Failed tests: 
  
TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
 Incorrect value for metric availableMB expected:2048 but was:4096
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-19 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940707#comment-13940707
 ] 

Mit Desai commented on YARN-1854:
-

It I got that failing in our nightly builds. When I tested it on my local 
machine, I got the same error.

But now when I tried testing it again, I get the following error intermittently.

{noformat}
testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) 
 Time elapsed: 1.755 sec   FAILURE!
java.lang.AssertionError: Incorrect value for metric appsPending expected:1 
but was:0
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:384)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:154)


Results :

Failed tests: 
  
TestRMHA.testStartAndTransitions:154-verifyClusterMetrics:384-assertMetric:396
 Incorrect value for metric appsPending expected:1 but was:0
{noformat}

 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Priority: Blocker

 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1833) TestRMAdminService Fails in branch-2

2014-03-14 Thread Mit Desai (JIRA)

Mit Desai created YARN-1833:
---

 Summary: TestRMAdminService Fails in branch-2
 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai


In the test 
testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
following assert is not needed.

{code}
Assert.assertTrue(groupWithInit.size() != groupBefore.size());
{code}

As the assert takes the default groups for groupWithInit (which in my case are 
users, sshusers and wheel), it fails as the size of both groupWithInit and 
groupBefore are same.

I do not think we need to have this assert here. Moreover we are also checking 
that the groupInit does not have the userGroups that are in the groupBefore so 
removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in branch-2

2014-03-14 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935331#comment-13935331
 ] 

Mit Desai commented on YARN-1833:
-

I am in the process of generating the patch. I will be uploading it soon.

 TestRMAdminService Fails in branch-2
 

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai

 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1833) TestRMAdminService Fails in branch-2

2014-03-14 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1833:


Attachment: YARN-1833.patch

Attaching the patch for trunk and branch-2

 TestRMAdminService Fails in branch-2
 

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-14 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935502#comment-13935502
 ] 

Mit Desai commented on YARN-1591:
-

Hey, I have done a little investigation on the test.
{code}
static {
DefaultMetricsSystem.setMiniClusterMode(true);
  }
{code}

Setting this property ignores the Metrics source already in the unit test.
This change seems to be working on my local machine. What do you guys think of 
it?

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch, YARN-1591.5.patch, YARN-1591.6.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2

2014-03-14 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935505#comment-13935505
 ] 

Mit Desai commented on YARN-1833:
-

Thanks Akira. Even I verified that it is not related to my patch.

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1833) TestRMAdminService Fails in trunk and branch-2

2014-03-14 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1833:


Attachment: YARN-1833-v2.patch

Thanks [~jeagles] for the suggestion. I did not think about that solution. 
Attaching the new patch with the dummyUser for the test and no Assert Removed.

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833-v2.patch, YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-04 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670-b23.patch

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-04 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670.patch

Attaching the patch for trunk, branch2  and branch23.

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-04 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670.patch

updated patch for trunk and branch-2

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-04 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920137#comment-13920137
 ] 

Mit Desai commented on YARN-1670:
-

It is a code change and there are no unit tests for this change

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-03 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned YARN-1670:
---

Assignee: Mit Desai

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical

 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently

2014-02-19 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905799#comment-13905799
 ] 

Mit Desai commented on YARN-1281:
-

Is this failure just related to the test or is there some bug in hadoop?

 TestZKRMStateStoreZKClientConnections fails intermittently
 --

 Key: YARN-1281
 URL: https://issues.apache.org/jira/browse/YARN-1281
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 The test fails intermittently - haven't been able to reproduce the failure 
 deterministically. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently

2014-02-19 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906020#comment-13906020
 ] 

Mit Desai commented on YARN-1281:
-

I had tried it on my machine and it was passing too. Just wanted to make sure 
it is a test issue and not a real bug

 TestZKRMStateStoreZKClientConnections fails intermittently
 --

 Key: YARN-1281
 URL: https://issues.apache.org/jira/browse/YARN-1281
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 The test fails intermittently - haven't been able to reproduce the failure 
 deterministically. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently

2014-02-19 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai resolved YARN-1281.
-

  Resolution: Cannot Reproduce
Target Version/s:   (was: )

This JIRA has been open for a long time and the issue does not seem to be 
reproducible. I am closing it for now. We can open it again if we find out that 
it is failing again.

 TestZKRMStateStoreZKClientConnections fails intermittently
 --

 Key: YARN-1281
 URL: https://issues.apache.org/jira/browse/YARN-1281
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 The test fails intermittently - haven't been able to reproduce the failure 
 deterministically. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Reopened] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently

2014-02-19 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reopened YARN-1281:
-


I see Karthik. Reopening it

 TestZKRMStateStoreZKClientConnections fails intermittently
 --

 Key: YARN-1281
 URL: https://issues.apache.org/jira/browse/YARN-1281
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 The test fails intermittently - haven't been able to reproduce the failure 
 deterministically. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1628) TestContainerManagerSecurity fails on trunk

2014-02-05 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892754#comment-13892754
 ] 

Mit Desai commented on YARN-1628:
-

I have some workaround. Will get back with what I get on it.

 TestContainerManagerSecurity fails on trunk
 ---

 Key: YARN-1628
 URL: https://issues.apache.org/jira/browse/YARN-1628
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1628.1.patch, YARN-1628.patch


 The Test fails with the following error
 {noformat}
 java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost
   at 
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
   at 
 org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145)
   at 
 org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1628) TestContainerManagerSecurity fails on trunk

2014-01-23 Thread Mit Desai (JIRA)

Mit Desai created YARN-1628:
---

 Summary: TestContainerManagerSecurity fails on trunk
 Key: YARN-1628
 URL: https://issues.apache.org/jira/browse/YARN-1628
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0, 3.0.0
Reporter: Mit Desai
Assignee: Mit Desai


The Test fails with the following error

{noformat}
java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at 
org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145)
at 
org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1628) TestContainerManagerSecurity fails on trunk

2014-01-23 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1628:


Attachment: YARN-1628.patch

Attaching the patch.
The argument InvalidHost was not a valid argument which resulted into 
throwing UnknownHostException.

 TestContainerManagerSecurity fails on trunk
 ---

 Key: YARN-1628
 URL: https://issues.apache.org/jira/browse/YARN-1628
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1628.patch


 The Test fails with the following error
 {noformat}
 java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost
   at 
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
   at 
 org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145)
   at 
 org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails

2013-12-03 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838260#comment-13838260
 ] 

Mit Desai commented on YARN-1463:
-

Just for information, This also causes the TestJHSSecurity#testDeligationToken 
to fail

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails

2013-12-03 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838262#comment-13838262
 ] 

Mit Desai commented on YARN-1463:
-

Correction: TestJHSSecurity#testDelegationToken

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: YARN-1463.000.patch, YARN-1463.v1.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1270) TestSLSRunner test is failing

2013-10-04 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1270:


Description: Added in the YARN-1021 patch, the test TestSLSRunner is now 
failing.  (was: Added in the YARn-1021 patch, the test TestSLSRunner is now 
failing.)

 TestSLSRunner test is failing
 -

 Key: YARN-1270
 URL: https://issues.apache.org/jira/browse/YARN-1270
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai

 Added in the YARN-1021 patch, the test TestSLSRunner is now failing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1270) TestSLSRunner test is failing

2013-10-03 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1270:


Summary: TestSLSRunner test is failing  (was: TestSLSRunner is failing)

 TestSLSRunner test is failing
 -

 Key: YARN-1270
 URL: https://issues.apache.org/jira/browse/YARN-1270
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai

 Added in the YARn-1021 patch, the test TestSLSRunner is now failing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-10-03 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785547#comment-13785547
 ] 

Mit Desai commented on YARN-1021:
-

Hey Wei, FYI, I would like to inform you that the test TestSLSRunner is 
failing. I have created a new JIRA for that YARN-1270

 Yarn Scheduler Load Simulator
 -

 Key: YARN-1021
 URL: https://issues.apache.org/jira/browse/YARN-1021
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.3.0

 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
 YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf


 The Yarn Scheduler is a fertile area of interest with different 
 implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
 several optimizations are also made to improve scheduler performance for 
 different scenarios and workload. Each scheduler algorithm has its own set of 
 features, and drives scheduling decisions by many factors, such as fairness, 
 capacity guarantee, resource availability, etc. It is very important to 
 evaluate a scheduler algorithm very well before we deploy it in a production 
 cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
 algorithm. Evaluating in a real cluster is always time and cost consuming, 
 and it is also very hard to find a large-enough cluster. Hence, a simulator 
 which can predict how well a scheduler algorithm for some specific workload 
 would be quite useful.
 We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
 clusters and application loads in a single machine. This would be invaluable 
 in furthering Yarn by providing a tool for researchers and developers to 
 prototype new scheduler features and predict their behavior and performance 
 with reasonable amount of confidence, there-by aiding rapid innovation.
 The simulator will exercise the real Yarn ResourceManager removing the 
 network factor by simulating NodeManagers and ApplicationMasters via handling 
 and dispatching NM/AMs heartbeat events from within the same JVM.
 To keep tracking of scheduler behavior and performance, a scheduler wrapper 
 will wrap the real scheduler.
 The simulator will produce real time metrics while executing, including:
 * Resource usages for whole cluster and each queue, which can be utilized to 
 configure cluster and queue's capacity.
 * The detailed application execution trace (recorded in relation to simulated 
 time), which can be analyzed to understand/validate the  scheduler behavior 
 (individual jobs turn around time, throughput, fairness, capacity guarantee, 
 etc).
 * Several key metrics of scheduler algorithm, such as time cost of each 
 scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
 developers to find the code spots and scalability limits.
 The simulator will provide real time charts showing the behavior of the 
 scheduler and its performance.
 A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
 how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1199) Make NM/RM Versions Available

2013-09-19 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1199:


Attachment: YARN-1199.patch

Thanks Rob for pointing this out. I have made the changes to the patch and 
attached it. Can you please review it.

 Make NM/RM Versions Available
 -

 Key: YARN-1199
 URL: https://issues.apache.org/jira/browse/YARN-1199
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch


 Now as we have the NM and RM Versions available, we can display the YARN 
 version of nodes running in the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1119) Add ClusterMetrics checks to tho TestRMNodeTransitions tests

2013-09-10 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1119:


Attachment: YARN-1119.patch

Patch posted for trunk

 Add ClusterMetrics checks to tho TestRMNodeTransitions tests
 

 Key: YARN-1119
 URL: https://issues.apache.org/jira/browse/YARN-1119
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 3.0.0, 0.23.9, 2.0.6-alpha
Reporter: Robert Parker
Assignee: Mit Desai
 Attachments: YARN-1119.patch, YARN-1119-v1-b23.patch


 YARN-1101 identified an issue where UNHEALTHY nodes could double decrement 
 the active nodes. We should add checks for RUNNING node transitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1119) Add ClusterMetrics checks to tho TestRMNodeTransitions tests

2013-09-05 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1119:


Assignee: Mit Desai

 Add ClusterMetrics checks to tho TestRMNodeTransitions tests
 

 Key: YARN-1119
 URL: https://issues.apache.org/jira/browse/YARN-1119
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 3.0.0, 0.23.9, 2.0.6-alpha
Reporter: Robert Parker
Assignee: Mit Desai
 Attachments: YARN-1119-v1-b23.patch


 YARN-1101 identified an issue where UNHEALTHY nodes could double decrement 
 the active nodes. We should add checks for RUNNING node transitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1119) Add ClusterMetrics checks to tho TestRMNodeTransitions tests

2013-09-04 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1119:


Attachment: YARN-1119-v1-b23.patch

Patch posted for branch 0.23.

 Add ClusterMetrics checks to tho TestRMNodeTransitions tests
 

 Key: YARN-1119
 URL: https://issues.apache.org/jira/browse/YARN-1119
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 3.0.0, 0.23.9, 2.0.6-alpha
Reporter: Robert Parker
 Attachments: YARN-1119-v1-b23.patch


 YARN-1101 identified an issue where UNHEALTHY nodes could double decrement 
 the active nodes. We should add checks for RUNNING node transitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues

2013-08-15 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-897:
---

Attachment: YARN-897-08152013-br-0.23.patch

Posted patch for branch-0.23

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.4-alpha
Reporter: Djellel Eddine Difallah
Assignee: Djellel Eddine Difallah
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: TestBugParentQueue.java, 
 YARN-897-08152013-br-0.23.patch, YARN-897-1.patch, YARN-897-2.patch, 
 YARN-897-3.patch, YARN-897-4.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

< 1 2 3

201 - 258 of 258 matches

Mail list logo