[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2

2014-03-14 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935704#comment-13935704
 ] 

Jonathan Eagles commented on YARN-1833:
---

+1. YARN-1830 causes the TestRMRestart error.

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Attachments: YARN-1833-v2.patch, YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1136) Replace junit.framework.Assert with org.junit.Assert

2014-03-17 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1136:
--

Attachment: yarn1136-v1.patch

Kicking the build with an updated patch.

 Replace junit.framework.Assert with org.junit.Assert
 

 Key: YARN-1136
 URL: https://issues.apache.org/jira/browse/YARN-1136
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Chen He
  Labels: newbie, test
 Attachments: yarn1136-v1.patch, yarn1136.patch


 There are several places where we are using junit.framework.Assert instead of 
 org.junit.Assert.
 {code}grep -rn junit.framework.Assert hadoop-yarn-project/ 
 --include=*.java{code} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (YARN-1845) Elapsed time for failed tasks that never started is wrong

2014-03-17 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles moved MAPREDUCE-5797 to YARN-1845:
--

  Component/s: (was: webapps)
   (was: jobhistoryserver)
 Target Version/s: 3.0.0, 2.5.0  (was: 0.23.11, 2.4.0)
Affects Version/s: (was: 0.23.9)
   0.23.9
   Issue Type: Improvement  (was: Bug)
  Key: YARN-1845  (was: MAPREDUCE-5797)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

  Elapsed time for failed tasks that never started is  wrong 
 

 Key: YARN-1845
 URL: https://issues.apache.org/jira/browse/YARN-1845
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 0.23.9
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: MAPREDUCE-5797-v3.patch, patch-MapReduce-5797-v2.patch, 
 patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch


 The elapsed time for tasks in a failed job that were never
 started can be way off.  It looks like we're marking the start time as the
 beginning of the epoch (i.e.: start time = -1) but the finish time is when the
 task was marked as failed when the whole job failed.  That causes the
 calculated elapsed time of the task to be a ridiculous number of hours.
 Tasks that fail without any attempts shouldn't have start/finish/elapsed 
 times.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1845) Elapsed time for failed tasks that never started is wrong

2014-03-17 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938017#comment-13938017
 ] 

Jonathan Eagles commented on YARN-1845:
---

+1. lgtm. Thanks for the patch, Rushabh. Committing this to branch-2 and trunk.

  Elapsed time for failed tasks that never started is  wrong 
 

 Key: YARN-1845
 URL: https://issues.apache.org/jira/browse/YARN-1845
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 0.23.9
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: MAPREDUCE-5797-v3.patch, patch-MapReduce-5797-v2.patch, 
 patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch


 The elapsed time for tasks in a failed job that were never
 started can be way off.  It looks like we're marking the start time as the
 beginning of the epoch (i.e.: start time = -1) but the finish time is when the
 task was marked as failed when the whole job failed.  That causes the
 calculated elapsed time of the task to be a ridiculous number of hours.
 Tasks that fail without any attempts shouldn't have start/finish/elapsed 
 times.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1845) Elapsed time for failed tasks that never started is wrong

2014-03-17 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938016#comment-13938016
 ] 

Jonathan Eagles commented on YARN-1845:
---

Moved this to YARN to better reflect where the changes are taking place.

  Elapsed time for failed tasks that never started is  wrong 
 

 Key: YARN-1845
 URL: https://issues.apache.org/jira/browse/YARN-1845
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 0.23.9
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: MAPREDUCE-5797-v3.patch, patch-MapReduce-5797-v2.patch, 
 patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch


 The elapsed time for tasks in a failed job that were never
 started can be way off.  It looks like we're marking the start time as the
 beginning of the epoch (i.e.: start time = -1) but the finish time is when the
 task was marked as failed when the whole job failed.  That causes the
 calculated elapsed time of the task to be a ridiculous number of hours.
 Tasks that fail without any attempts shouldn't have start/finish/elapsed 
 times.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-17 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938352#comment-13938352
 ] 

Jonathan Eagles commented on YARN-1769:
---

TestResourceTrackerService test issue is caused by YARN-1591

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1833) TestRMAdminService Fails in trunk and branch-2

2014-03-19 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1833:
--

Fix Version/s: 2.4.0

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Fix For: 3.0.0, 2.4.0, 2.5.0

 Attachments: YARN-1833-v2.patch, YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1833) TestRMAdminService Fails in trunk and branch-2

2014-03-19 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940543#comment-13940543
 ] 

Jonathan Eagles commented on YARN-1833:
---

Added this test only fix to 2.4.0 release since it is really hindering my 
testing efforts on that line. 

 TestRMAdminService Fails in trunk and branch-2
 --

 Key: YARN-1833
 URL: https://issues.apache.org/jira/browse/YARN-1833
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: Test
 Fix For: 3.0.0, 2.4.0, 2.5.0

 Attachments: YARN-1833-v2.patch, YARN-1833.patch


 In the test 
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the 
 following assert is not needed.
 {code}
 Assert.assertTrue(groupWithInit.size() != groupBefore.size());
 {code}
 As the assert takes the default groups for groupWithInit (which in my case 
 are users, sshusers and wheel), it fails as the size of both groupWithInit 
 and groupBefore are same.
 I do not think we need to have this assert here. Moreover we are also 
 checking that the groupInit does not have the userGroups that are in the 
 groupBefore so removing the assert may not be harmful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943421#comment-13943421
 ] 

Jonathan Eagles commented on YARN-1670:
---

Mit, I'm worried that we are still going to have this issue except in the 
opposite way. On the last read that puts us over the initial filelength, we are 
not going to write the last part of the data that still fits into the original 
filelength. In this case our Aggregate File Log length will be smaller than the 
filelength written to the data structure.

jeagles

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles reopened YARN-1670:
---


 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943422#comment-13943422
 ] 

Jonathan Eagles commented on YARN-1670:
---

I've reopened this ticket to verify the correctness of the patch that went into 
branch-2 and branch-2.4.

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-23 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944720#comment-13944720
 ] 

Jonathan Eagles commented on YARN-1670:
---

Thanks, [~mdesai]. The above logic seems correct, now. Two minor things.
 - If we move from a count up byte counter to a count down byte counter,  does 
this seem easier to understand?
{code}
long bytesLeft = file.length();
while (len = in.read(buf)) != -1) {
  //If buffer contents within fileLength, write
  if (len  bytesLeft) {
out.write(buf, 0, len);
bytesLeft -= len;
  }
  //else only write contents that are within fileLength, then exit early 
  else {
out.write(buf, 0, (int)bytesLeft);
break;
  }
}
{code}
 - I see the buffer size of 65535 being used (I know, not your code). I wonder 
if this is really intended to be block aligned (64K) since that will result in 
theoretical optimal read performance.

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945436#comment-13945436
 ] 

Jonathan Eagles commented on YARN-1670:
---

+1 on this change. committing to trunk, branch-2.4, branch-2, branch-0.23

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, 
 YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1426) YARN Components need to unregister their beans upon shutdown

2014-03-27 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1426:
--

Attachment: YARN-1426.patch

 YARN Components need to unregister their beans upon shutdown
 

 Key: YARN-1426
 URL: https://issues.apache.org/jira/browse/YARN-1426
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.3.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1426.patch, YARN-1426.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1106) The RM should point the tracking url to the RM app page if its empty

2014-03-27 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1106:
--

Attachment: YARN-1106.patch

[~tgraves]. I tried the latest patch against trunk but the tests now fail since 
the originalTrackingUrl is set to N/A and not null or empty. If we still want 
this behavior, we will need to add this condition as well.

 The RM should point the tracking url to the RM app page if its empty
 

 Key: YARN-1106
 URL: https://issues.apache.org/jira/browse/YARN-1106
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1106.patch, YARN-1106.patch


 It would be nice if the Resourcemanager set the tracking url to the RM app 
 page if the application master doesn't pass one or passes the empty string.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1883) TestRMAdminService fails due to inconsistent entries in UserGroups

2014-03-28 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951270#comment-13951270
 ] 

Jonathan Eagles commented on YARN-1883:
---

+1. Thanks for cleaning this test up. The double bracket initialization that 
was there before is considered a hack since it is creating an anonymous 
subclass with a static initialization. Committing to trunk and branch-2.

jeagles

 TestRMAdminService fails due to inconsistent entries in UserGroups
 --

 Key: YARN-1883
 URL: https://issues.apache.org/jira/browse/YARN-1883
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7
 Attachments: YARN-1883.patch, YARN-1883.patch


 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails 
 with the following error:
 {noformat}
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104)
 {noformat}
 Line Numbers will be inconsistent as I was testing to run it in a particular 
 order. But the Line on which the failure occurs is
 {code}
 Assert.assertTrue(groupBefore.contains(test_group_A)
  groupBefore.contains(test_group_B)
  groupBefore.contains(test_group_C)  groupBefore.size() == 3);
 {code}
 testRMInitialsWithFileSystemBasedConfigurationProvider() and
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider()
 calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes 
 the list of userGroups.
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() 
 tries to verify the groups before changing it and fails if 
 testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made 
 the changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1906) TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and branch2

2014-04-07 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962175#comment-13962175
 ] 

Jonathan Eagles commented on YARN-1906:
---

Mit, you might consider using waitForState instead of a raw sleep. This will 
protected us in the case of a missed race condition, though perhaps will result 
in more sleep time overall.

 TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and 
 branch2
 ---

 Key: YARN-1906
 URL: https://issues.apache.org/jira/browse/YARN-1906
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1906.patch


 Here is the output of the format
 {noformat}
 testQueueMetricsOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
   Time elapsed: 9.757 sec   FAILURE!
 java.lang.AssertionError: expected:2 but was:1
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1735)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1706)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1981) Nodemanager version is not updated when a node reconnects

2014-05-14 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996825#comment-13996825
 ] 

Jonathan Eagles commented on YARN-1981:
---

+1. lgtm. Committing to branch-2 and trunk. Thanks, [~jlowe].

 Nodemanager version is not updated when a node reconnects
 -

 Key: YARN-1981
 URL: https://issues.apache.org/jira/browse/YARN-1981
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1981.patch


 When a nodemanager is quickly restarted and happens to change versions during 
 the restart (e.g.: rolling upgrade scenario) the NM version as reported by 
 the RM is not updated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-06-11 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027938#comment-14027938
 ] 

Jonathan Eagles commented on YARN-1198:
---

Since headroom calculation is used reducer preemption, I have seen issues with 
these bugs causes queue deadlock where multi-job queue is full of reducers that 
can't finish since the mappers can't run due to reducers having higher task 
priority. Preemption doesn't kill reducers since headroom falsely shows there 
is plenty of room in the queue for mappers to run.

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi

 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-06-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1857:
--

Priority: Critical  (was: Major)

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-06-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1857:
--

Target Version/s: 2.4.1  (was: 2.4.0)

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
 Attachments: YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-06-11 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027943#comment-14027943
 ] 

Jonathan Eagles commented on YARN-1857:
---

Bumping the priority since reducer preemption is broken in many cases without 
this fix.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2184) ResourceManager may fail due to name node in safe mode

2014-06-20 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038912#comment-14038912
 ] 

Jonathan Eagles commented on YARN-2184:
---

Jeff, This issue has already be reported under YARN-2035 by me and there is a 
patch available. Let me know if this solves your issue and we can close this 
ticket out.

 ResourceManager may fail due to name node in safe mode
 --

 Key: YARN-2184
 URL: https://issues.apache.org/jira/browse/YARN-2184
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang

 If the historyservice is enabled in resourcemanager, it will try to mkdir 
 when service is inited. And at that time maybe the name node is still in 
 safemode which may cause the historyservice failed and then cause the 
 resouremanager fail. It would be very possible when the cluster is restarted 
 when namenode will be in safemode in a long time.
 Here's the error logs:
 {code}
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
  Cannot create directory 
 /Users/jzhang/Java/lib/hadoop-2.4.0/logs/yarn/system/history/ApplicationHistoryDataRoot.
  Name node is in safe mode.
 The reported blocks 85 has reached the threshold 0.9990 of total blocks 85. 
 The number of live datanodes 1 has reached the minimum number 0. In safe mode 
 extension. Safe mode will be turned off automatically in 19 seconds.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1195)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3564)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3540)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 at org.apache.hadoop.ipc.Client.call(Client.java:1410)
 at org.apache.hadoop.ipc.Client.call(Client.java:1363)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
 at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:500)
 at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2553)
 at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2524)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:823)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:823)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:816)
 at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.serviceInit(FileSystemApplicationHistoryStore.java:120)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 ... 10 more
 2014-06-20 11:06:25,220 INFO 
 

[jira] [Created] (YARN-2277) Add JSONP support to the ATS REST API

2014-07-10 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-2277:
-

 Summary: Add JSONP support to the ATS REST API
 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles


As the Application Timeline Server is provided with built-in UI, it may make 
sense to enable JSONP Rest API capabilities to allow for remote UI to access 
the data directly via javascript without cross side server browser blocks 
coming into play.

Example client may be like
http://api.jquery.com/jQuery.getJSON/ 

This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2277) Add JSONP support to the ATS REST API

2014-07-10 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2277:
--

Attachment: YARN-2277.patch

Starter patch for conversation starter

 Add JSONP support to the ATS REST API
 -

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
 Attachments: YARN-2277.patch


 As the Application Timeline Server is provided with built-in UI, it may make 
 sense to enable JSONP Rest API capabilities to allow for remote UI to access 
 the data directly via javascript without cross side server browser blocks 
 coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2277) Add JSONP support to the ATS REST API

2014-07-10 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058366#comment-14058366
 ] 

Jonathan Eagles commented on YARN-2277:
---

A brief discussion on options 

http://jvaneyck.wordpress.com/2014/01/07/cross-domain-requests-in-javascript/

the JSONP method is already being used as part of jmx queries so I felt this 
was most consistent with the current system. I'm no way married to this 
approach.

 Add JSONP support to the ATS REST API
 -

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
 Attachments: YARN-2277.patch


 As the Application Timeline Server is provided with built-in UI, it may make 
 sense to enable JSONP Rest API capabilities to allow for remote UI to access 
 the data directly via javascript without cross side server browser blocks 
 coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2277) Add JSONP support to the ATS REST API

2014-07-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2277:
--

Attachment: YARN-2277-CORS.patch

 Add JSONP support to the ATS REST API
 -

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277.patch


 As the Application Timeline Server is provided with built-in UI, it may make 
 sense to enable JSONP Rest API capabilities to allow for remote UI to access 
 the data directly via javascript without cross side server browser blocks 
 coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-07-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2277:
--

Attachment: (was: YARN-2277-CORS.patch)

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles

 As the Application Timeline Server is provided with built-in UI, it may make 
 sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to 
 access the data directly via javascript without cross side server browser 
 blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-07-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2277:
--

Attachment: YARN-2277-JSONP.patch

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
 Attachments: YARN-2277-JSONP.patch


 As the Application Timeline Server is provided with built-in UI, it may make 
 sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to 
 access the data directly via javascript without cross side server browser 
 blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-07-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2277:
--

Description: 
As the Application Timeline Server is provided with built-in UI, it may make 
sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to 
access the data directly via javascript without cross side server browser 
blocks coming into play.

Example client may be like
http://api.jquery.com/jQuery.getJSON/ 

This can alleviate the need to create a local proxy cache.

  was:
As the Application Timeline Server is provided with built-in UI, it may make 
sense to enable JSONP Rest API capabilities to allow for remote UI to access 
the data directly via javascript without cross side server browser blocks 
coming into play.

Example client may be like
http://api.jquery.com/jQuery.getJSON/ 

This can alleviate the need to create a local proxy cache.

Summary: Add Cross-Origin support to the ATS REST API  (was: Add JSONP 
support to the ATS REST API)

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles

 As the Application Timeline Server is provided with built-in UI, it may make 
 sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to 
 access the data directly via javascript without cross side server browser 
 blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-07-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2277:
--

Attachment: (was: YARN-2277.patch)

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles

 As the Application Timeline Server is provided with built-in UI, it may make 
 sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to 
 access the data directly via javascript without cross side server browser 
 blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-07-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2277:
--

Attachment: YARN-2277-CORS.patch

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch


 As the Application Timeline Server is provided with built-in UI, it may make 
 sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to 
 access the data directly via javascript without cross side server browser 
 blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-07-11 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059205#comment-14059205
 ] 

Jonathan Eagles commented on YARN-2277:
---

[~vinodkv] and [~zjshen] do you guys have any thoughts to the approach taken?

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch


 As the Application Timeline Server is provided with built-in UI, it may make 
 sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to 
 access the data directly via javascript without cross side server browser 
 blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-819) ResourceManager and NodeManager should check for a minimum allowed version

2013-09-26 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779097#comment-13779097
 ] 

Jonathan Eagles commented on YARN-819:
--

+1. Great fix, Rob.

 ResourceManager and NodeManager should check for a minimum allowed version
 --

 Key: YARN-819
 URL: https://issues.apache.org/jira/browse/YARN-819
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Robert Parker
Assignee: Robert Parker
 Attachments: YARN-819-1.patch, YARN-819-2.patch, YARN-819-3.patch


 Our use case is during upgrade on a large cluster several NodeManagers may 
 not restart with the new version.  Once the RM comes back up the NodeManager 
 will re-register without issue to the RM.
 The NM should report the version the RM.  The RM should have a configuration 
 to disallow the check (default), equal to the RM (to prevent config change 
 for each release), equal to or greater than RM (to allow NM upgrades), and 
 finally an explicit version or version range.
 The RM should also have an configuration on how to treat the mismatch: 
 REJECT, or REBOOT the NM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1199) Make NM/RM Versions Available

2013-09-26 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779169#comment-13779169
 ] 

Jonathan Eagles commented on YARN-1199:
---

I have submitted this patch now that YARN-819 is in. Will check-in pending +1 
from Hadoop QA

 Make NM/RM Versions Available
 -

 Key: YARN-1199
 URL: https://issues.apache.org/jira/browse/YARN-1199
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch


 Now as we have the NM and RM Versions available, we can display the YARN 
 version of nodes running in the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1243) ResourceManager: Error in handling event type NODE_UPDATE to the scheduler - NPE at SchedulerApp.java:411

2013-09-26 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779187#comment-13779187
 ] 

Jonathan Eagles commented on YARN-1243:
---

+1. Verified backport to branch-0.23 and ran tests.

 ResourceManager: Error in handling event type NODE_UPDATE to the scheduler - 
 NPE at SchedulerApp.java:411
 -

 Key: YARN-1243
 URL: https://issues.apache.org/jira/browse/YARN-1243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.8
 Environment: RHEL - 6.4, Hadoop 0.23.8
Reporter: Sanjay Upadhyay
Assignee: Jason Lowe
 Attachments: YARN-1243.branch-0.23.patch


 2013-09-26 03:25:02,262 [ResourceManager Event Processor] FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.unreserve(SchedulerApp.java:411)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1333)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1261)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1137)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1092)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:887)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:788)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:594)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:656)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340)
 at java.lang.Thread.run(Thread.java:722)
 Yarn Resource manager exits at this NPE

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-677) Increase coverage to FairScheduler

2013-10-02 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-677:
-

Summary: Increase coverage to FairScheduler  (was: Add test methods in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)

 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-677) Increase coverage to FairScheduler

2013-10-02 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784086#comment-13784086
 ] 

Jonathan Eagles commented on YARN-677:
--

+1. Thanks for the coverage addition for this component.

 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy

2013-10-02 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784149#comment-13784149
 ] 

Jonathan Eagles commented on YARN-465:
--

I haven't looked too closely at this, but I see a setAccessible call. This is 
the same technique that powermock uses to access field which has been a 
disalllowed testing technique in the hadoop stack. The reason being that it 
points usually to an improvement that should be made to the class under test.

 fix coverage  org.apache.hadoop.yarn.server.webproxy
 

 Key: YARN-465
 URL: https://issues.apache.org/jira/browse/YARN-465
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-465-branch-0.23-a.patch, 
 YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, 
 YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch


 fix coverage  org.apache.hadoop.yarn.server.webproxy
 patch YARN-465-trunk.patch for trunk
 patch YARN-465-branch-2.patch for branch-2
 patch YARN-465-branch-0.23.patch for branch-0.23
 There is issue in branch-0.23 . Patch does not creating .keep file.
 For fix it need to run commands:
 mkdir 
 yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
 touch 
 yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-677) Increase coverage to FairScheduler

2013-10-03 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785276#comment-13785276
 ] 

Jonathan Eagles commented on YARN-677:
--

Thanks, Sandy. Let me take a look at the coverage numbers before this patch 
went in. In the mean time I will revert until I can prove we need this coverage 
patch.

 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Fix For: 3.0.0, 2.3.0

 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-677) Increase coverage to FairScheduler

2013-10-03 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-677:
-

Fix Version/s: (was: 2.3.0)
   (was: 3.0.0)

 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1199) Make NM/RM Versions Available

2013-10-03 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785526#comment-13785526
 ] 

Jonathan Eagles commented on YARN-1199:
---

+1. Thanks, Mit.

 Make NM/RM Versions Available
 -

 Key: YARN-1199
 URL: https://issues.apache.org/jira/browse/YARN-1199
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch, 
 YARN-1199.patch


 Now as we have the NM and RM Versions available, we can display the YARN 
 version of nodes running in the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently

2013-10-22 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801883#comment-13801883
 ] 

Jonathan Eagles commented on YARN-1183:
---

Great work, everybody. Looks like this patch is ready for checkin. I am 
assuming this is targeted for trunk and branch-2. Also, can you post a maven 
command for manual testing? I would be happy to put this in.

 MiniYARNCluster shutdown takes several minutes intermittently
 -

 Key: YARN-1183
 URL: https://issues.apache.org/jira/browse/YARN-1183
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Attachments: YARN-1183--n2.patch, YARN-1183--n3.patch, 
 YARN-1183--n4.patch, YARN-1183.patch


 As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java 
 processes living for several minutes after successful completion of the 
 corresponding test. There is a concurrency issue in MiniYARNCluster shutdown 
 logic which leads to this. Sometimes RM stops before an app master sends it's 
 last report, and then the app master keeps retrying for 6 minutes. In some 
 cases it leads to failures in subsequent tests, and it affects performance of 
 tests as app masters eat resources.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently

2013-10-22 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802170#comment-13802170
 ] 

Jonathan Eagles commented on YARN-1183:
---

Can you post an update patch so I can check in? Current one doesn't apply after 
YARN-1182.

 MiniYARNCluster shutdown takes several minutes intermittently
 -

 Key: YARN-1183
 URL: https://issues.apache.org/jira/browse/YARN-1183
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Attachments: YARN-1183--n2.patch, YARN-1183--n3.patch, 
 YARN-1183--n4.patch, YARN-1183.patch


 As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java 
 processes living for several minutes after successful completion of the 
 corresponding test. There is a concurrency issue in MiniYARNCluster shutdown 
 logic which leads to this. Sometimes RM stops before an app master sends it's 
 last report, and then the app master keeps retrying for 6 minutes. In some 
 cases it leads to failures in subsequent tests, and it affects performance of 
 tests as app masters eat resources.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently

2013-10-22 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802282#comment-13802282
 ] 

Jonathan Eagles commented on YARN-1183:
---

I'm +1 on YARN-1183--n5.patch. Thanks Andrey and Karthik for getting this patch 
ready!

 MiniYARNCluster shutdown takes several minutes intermittently
 -

 Key: YARN-1183
 URL: https://issues.apache.org/jira/browse/YARN-1183
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov
 Attachments: YARN-1183--n2.patch, YARN-1183--n3.patch, 
 YARN-1183--n4.patch, YARN-1183--n5.patch, YARN-1183.patch


 As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java 
 processes living for several minutes after successful completion of the 
 corresponding test. There is a concurrency issue in MiniYARNCluster shutdown 
 logic which leads to this. Sometimes RM stops before an app master sends it's 
 last report, and then the app master keeps retrying for 6 minutes. In some 
 cases it leads to failures in subsequent tests, and it affects performance of 
 tests as app masters eat resources.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications

2013-10-23 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803666#comment-13803666
 ] 

Jonathan Eagles commented on YARN-473:
--

I haven't seen any updates on this, so assigning this to another contributor. 
Feel free chime in if you're still wanting this. I'd like to get this committed 
in the next week or so.

 Capacity Scheduler webpage and REST API not showing correct number of pending 
 applications
 --

 Key: YARN-473
 URL: https://issues.apache.org/jira/browse/YARN-473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Timothy Chen
  Labels: usability

 The Capacity Scheduler REST API 
 (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
  is not returning the correct number of pending applications.  
 numPendingApplications is almost always zero, even if there are dozens of 
 pending apps.
 In investigating this, I discovered that the Resource Manager's Scheduler 
 webpage is also showing an incorrect but different number of pending 
 applications.  For example, the cluster I'm looking at right now currently 
 has 15 applications in the ACCEPTED state, but the Cluster Metrics table near 
 the top of the page says there are only 2 pending apps.  The REST API says 
 there are zero pending apps.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications

2013-10-23 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-473:
-

Assignee: Mit Desai  (was: Timothy Chen)

 Capacity Scheduler webpage and REST API not showing correct number of pending 
 applications
 --

 Key: YARN-473
 URL: https://issues.apache.org/jira/browse/YARN-473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Mit Desai
  Labels: usability

 The Capacity Scheduler REST API 
 (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
  is not returning the correct number of pending applications.  
 numPendingApplications is almost always zero, even if there are dozens of 
 pending apps.
 In investigating this, I discovered that the Resource Manager's Scheduler 
 webpage is also showing an incorrect but different number of pending 
 applications.  For example, the cluster I'm looking at right now currently 
 has 15 applications in the ACCEPTED state, but the Cluster Metrics table near 
 the top of the page says there are only 2 pending apps.  The REST API says 
 there are zero pending apps.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23

2013-10-30 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809307#comment-13809307
 ] 

Jonathan Eagles commented on YARN-1031:
---

+1. Verified Jasons changes. Blocked access to ajax.googleapis.com via 
/etc/hosts before and after the change to visually inspect. Programmatically 
scanned network activity via firebug to verify new jquery-ui.css and icons are 
downloaded via local with no GETs to ajax.googleapis.com.

 JQuery UI components reference external css in branch-23
 

 Key: YARN-1031
 URL: https://issues.apache.org/jira/browse/YARN-1031
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1031-2-branch-0.23.patch, 
 YARN-1031-3-branch-0.23.patch, YARN-1031-branch-0.23.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1386) NodeManager mistakenly loses resources and relocalizes them

2013-11-12 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820682#comment-13820682
 ] 

Jonathan Eagles commented on YARN-1386:
---

+1. Great fix, Jason.

 NodeManager mistakenly loses resources and relocalizes them
 ---

 Key: YARN-1386
 URL: https://issues.apache.org/jira/browse/YARN-1386
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-1386.patch, YARN-1386.patch


 When a local resource that should already be present is requested again, the 
 nodemanager checks to see if it still present.  However the method it uses to 
 check for presence is via File.exists() as the user of the nodemanager 
 process. If the resource was a private resource localized for another user, 
 it will be localized to a location that is not accessible by the nodemanager 
 user.  Therefore File.exists() returns false, the nodemanager mistakenly 
 believes the resource is no longer available, and it proceeds to localize it 
 over and over.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1386) NodeManager mistakenly loses resources and relocalizes them

2013-11-13 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1386:
--

Fix Version/s: 2.2.1

 NodeManager mistakenly loses resources and relocalizes them
 ---

 Key: YARN-1386
 URL: https://issues.apache.org/jira/browse/YARN-1386
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 3.0.0, 2.3.0, 0.23.10, 2.2.1

 Attachments: YARN-1386.patch, YARN-1386.patch


 When a local resource that should already be present is requested again, the 
 nodemanager checks to see if it still present.  However the method it uses to 
 check for presence is via File.exists() as the user of the nodemanager 
 process. If the resource was a private resource localized for another user, 
 it will be localized to a location that is not accessible by the nodemanager 
 user.  Therefore File.exists() returns false, the nodemanager mistakenly 
 believes the resource is no longer available, and it proceeds to localize it 
 over and over.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Moved] (YARN-1419) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7

2013-11-15 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles moved MAPREDUCE-5630 to YARN-1419:
--

  Component/s: (was: scheduler)
   scheduler
 Target Version/s: 3.0.0, 2.3.0, 0.23.10  (was: 3.0.0, 2.3.0, 0.23.10)
Affects Version/s: (was: 0.23.10)
   (was: 2.3.0)
   (was: 3.0.0)
   0.23.10
   2.3.0
   3.0.0
  Key: YARN-1419  (was: MAPREDUCE-5630)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 
 

 Key: YARN-1419
 URL: https://issues.apache.org/jira/browse/YARN-1419
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.3.0, 0.23.10
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Minor
  Labels: java7

 QueueMetrics holds its data in a static variable causing metrics to bleed 
 over from test to test. clearQueueMetrics is to be called for tests that need 
 to measure metrics correctly for a single test. jdk7 comes into play since 
 tests are run out of order, and in the case make the metrics unreliable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1419) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7

2013-11-15 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1419:
--

Attachment: YARN-1419.patch

 TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 
 

 Key: YARN-1419
 URL: https://issues.apache.org/jira/browse/YARN-1419
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.3.0, 0.23.10
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Minor
  Labels: java7
 Attachments: YARN-1419.patch


 QueueMetrics holds its data in a static variable causing metrics to bleed 
 over from test to test. clearQueueMetrics is to be called for tests that need 
 to measure metrics correctly for a single test. jdk7 comes into play since 
 tests are run out of order, and in the case make the metrics unreliable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1419) TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7

2013-11-15 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1419:
--

Attachment: YARN-1419.patch

Instead of heavily changing the QueueMetrics class and its use of static class 
variables and it not unregistering the beans, I've chosen to take a simpler 
approach of just measuring the apps submitted delta.

 TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 
 

 Key: YARN-1419
 URL: https://issues.apache.org/jira/browse/YARN-1419
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.3.0, 0.23.10
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Minor
  Labels: java7
 Attachments: YARN-1419.patch, YARN-1419.patch


 QueueMetrics holds its data in a static variable causing metrics to bleed 
 over from test to test. clearQueueMetrics is to be called for tests that need 
 to measure metrics correctly for a single test. jdk7 comes into play since 
 tests are run out of order, and in the case make the metrics unreliable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1426) YARN Components need to unregister their beans upon shutdown

2013-11-19 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-1426:
-

 Summary: YARN Components need to unregister their beans upon 
shutdown
 Key: YARN-1426
 URL: https://issues.apache.org/jira/browse/YARN-1426
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.3.0
Reporter: Jonathan Eagles






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1420) TestRMContainerAllocator#testUpdatedNodes fails

2013-11-19 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827378#comment-13827378
 ] 

Jonathan Eagles commented on YARN-1420:
---

I ran git bisect on my mac using jdk 1.6 to detect when this test failures was 
introduced. YARN-1343 is the likely culprit. I haven't run this test on linux 
with jdk 1.6, but I suspect there are in fact two issues.

 TestRMContainerAllocator#testUpdatedNodes fails
 ---

 Key: YARN-1420
 URL: https://issues.apache.org/jira/browse/YARN-1420
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu

 From https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1607/console :
 {code}
 Running org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator
 Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 65.78 sec 
  FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator
 testUpdatedNodes(org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator) 
  Time elapsed: 3.125 sec   FAILURE!
 junit.framework.AssertionFailedError: null
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertTrue(Assert.java:27)
   at 
 org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator.testUpdatedNodes(TestRMContainerAllocator.java:779)
 {code}
 This assertion fails:
 {code}
 Assert.assertTrue(allocator.getJobUpdatedNodeEvents().isEmpty());
 {code}
 The List returned by allocator.getJobUpdatedNodeEvents() is:
 [EventType: JOB_UPDATED_NODES]



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-1420) TestRMContainerAllocator#testUpdatedNodes fails

2013-11-20 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles reassigned YARN-1420:
-

Assignee: Jonathan Eagles

 TestRMContainerAllocator#testUpdatedNodes fails
 ---

 Key: YARN-1420
 URL: https://issues.apache.org/jira/browse/YARN-1420
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Jonathan Eagles

 From https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1607/console :
 {code}
 Running org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator
 Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 65.78 sec 
  FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator
 testUpdatedNodes(org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator) 
  Time elapsed: 3.125 sec   FAILURE!
 junit.framework.AssertionFailedError: null
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertTrue(Assert.java:27)
   at 
 org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator.testUpdatedNodes(TestRMContainerAllocator.java:779)
 {code}
 This assertion fails:
 {code}
 Assert.assertTrue(allocator.getJobUpdatedNodeEvents().isEmpty());
 {code}
 The List returned by allocator.getJobUpdatedNodeEvents() is:
 [EventType: JOB_UPDATED_NODES]



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1420) TestRMContainerAllocator#testUpdatedNodes fails

2013-11-20 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1420:
--

Attachment: YARN-1420.patch

 TestRMContainerAllocator#testUpdatedNodes fails
 ---

 Key: YARN-1420
 URL: https://issues.apache.org/jira/browse/YARN-1420
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Jonathan Eagles
 Attachments: YARN-1420.patch


 From https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1607/console :
 {code}
 Running org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator
 Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 65.78 sec 
  FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator
 testUpdatedNodes(org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator) 
  Time elapsed: 3.125 sec   FAILURE!
 junit.framework.AssertionFailedError: null
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertTrue(Assert.java:27)
   at 
 org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator.testUpdatedNodes(TestRMContainerAllocator.java:779)
 {code}
 This assertion fails:
 {code}
 Assert.assertTrue(allocator.getJobUpdatedNodeEvents().isEmpty());
 {code}
 The List returned by allocator.getJobUpdatedNodeEvents() is:
 [EventType: JOB_UPDATED_NODES]



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-11-20 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828270#comment-13828270
 ] 

Jonathan Eagles commented on YARN-1343:
---

This change introduced a test failure in 
TestRMContainerAllocator#testUpdatedNodes MAPREDUCE-5632 since it is counting 
the jobUpdatedNodeEvents. Can someone [~tucu00] or [~bikassaha] verify the 
patch and make sure that the test reflects the new proper behavior and that I'm 
not masking a real error in the code.

 NodeManagers additions/restarts are not reported as node updates in 
 AllocateResponse responses to AMs
 -

 Key: YARN-1343
 URL: https://issues.apache.org/jira/browse/YARN-1343
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
 YARN-1343.patch


 If a NodeManager joins the cluster or gets restarted, running AMs never 
 receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-1426) YARN Components need to unregister their beans upon shutdown

2013-11-20 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles reassigned YARN-1426:
-

Assignee: Jonathan Eagles

 YARN Components need to unregister their beans upon shutdown
 

 Key: YARN-1426
 URL: https://issues.apache.org/jira/browse/YARN-1426
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.3.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles





--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1426) YARN Components need to unregister their beans upon shutdown

2013-11-20 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1426:
--

Attachment: YARN-1426.patch

 YARN Components need to unregister their beans upon shutdown
 

 Key: YARN-1426
 URL: https://issues.apache.org/jira/browse/YARN-1426
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.3.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1426.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1426) YARN Components need to unregister their beans upon shutdown

2013-11-21 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829381#comment-13829381
 ] 

Jonathan Eagles commented on YARN-1426:
---

Test failures:
  - TestJobCleanup is from MAPREDUCE-5552.
  -- Ran this test with and without my patch and both succeed on my desktop.

 YARN Components need to unregister their beans upon shutdown
 

 Key: YARN-1426
 URL: https://issues.apache.org/jira/browse/YARN-1426
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.3.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1426.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1136) Replace junit.framework.Assert with org.junit.Assert

2013-12-10 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1136:
--

Assignee: Chen He

 Replace junit.framework.Assert with org.junit.Assert
 

 Key: YARN-1136
 URL: https://issues.apache.org/jira/browse/YARN-1136
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Chen He
  Labels: newbie, test

 There are several places where we are using junit.framework.Assert instead of 
 org.junit.Assert.
 {code}grep -rn junit.framework.Assert hadoop-yarn-project/ 
 --include=*.java{code} 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (YARN-1491) Upgrade JUnit3 TestCase to JUnit 4

2013-12-10 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-1491:
-

 Summary: Upgrade JUnit3 TestCase to JUnit 4
 Key: YARN-1491
 URL: https://issues.apache.org/jira/browse/YARN-1491
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jonathan Eagles
Assignee: Chen He


There are still four references to test classes that extend from 
junit.framework.TestCase

hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java
hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsResourceCalculatorPlugin.java
hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java
hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1496) Protocol additions to allow moving apps between queues

2013-12-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1496:
--

Assignee: (was: Jonathan Eagles)

 Protocol additions to allow moving apps between queues
 --

 Key: YARN-1496
 URL: https://issues.apache.org/jira/browse/YARN-1496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Sandy Ryza





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Assigned] (YARN-1496) Protocol additions to allow moving apps between queues

2013-12-11 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles reassigned YARN-1496:
-

Assignee: Jonathan Eagles  (was: Sandy Ryza)

 Protocol additions to allow moving apps between queues
 --

 Key: YARN-1496
 URL: https://issues.apache.org/jira/browse/YARN-1496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Jonathan Eagles





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1180) Update capacity scheduler docs to include types on the configs

2013-12-19 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1180:
--

Fix Version/s: (was: 2.4.0)

 Update capacity scheduler docs to include types on the configs
 --

 Key: YARN-1180
 URL: https://issues.apache.org/jira/browse/YARN-1180
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9
Reporter: Thomas Graves
Assignee: Chen He
  Labels: documentation, newbie
 Attachments: Yarn-1180.patch


 The capacity scheduler docs 
 (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html)
  don't include types for all the configs. For instance the 
 minimum-user-limit-percent doesn't say its an Int.  It also the only setting 
 for the Resource Allocation configs that is an Int rather then a float.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1180) Update capacity scheduler docs to include types on the configs

2013-12-19 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853422#comment-13853422
 ] 

Jonathan Eagles commented on YARN-1180:
---

Thanks for the patch, Chen. I have taken a look at this patch and I noticed 
that you have added the types on the configs. Everything looks good there. One 
thing I did notice is that user-metrics.enable, resource-calculator, 
node-locality-delay, and possibly others have been left undocumented for some 
time. I'm okay with doing that work as part of another JIRA or expanding the 
scope of this JIRA to do that work. 

Jon

 Update capacity scheduler docs to include types on the configs
 --

 Key: YARN-1180
 URL: https://issues.apache.org/jira/browse/YARN-1180
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9
Reporter: Thomas Graves
Assignee: Chen He
  Labels: documentation, newbie
 Attachments: Yarn-1180.patch


 The capacity scheduler docs 
 (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html)
  don't include types for all the configs. For instance the 
 minimum-user-limit-percent doesn't say its an Int.  It also the only setting 
 for the Resource Allocation configs that is an Int rather then a float.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1479) Invalid NaN values in Hadoop REST API JSON response

2014-01-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881180#comment-13881180
 ] 

Jonathan Eagles commented on YARN-1479:
---

Thanks, Chen. Couple of minor things and a question for you.
* There are a couple of unnecessary imports in TestApplicationMasterService. 
Let's get those cleaned up before this patch goes in.
* progressCheck - the function will be better off package-private since the 
intention is not to advertise new functionality
* progressCheck - this function should be renamed since check is a question and 
not an indication something is being modified. Perhaps progressFilter or 
hopefully you can think of something better.


 Invalid NaN values in Hadoop REST API JSON response
 ---

 Key: YARN-1479
 URL: https://issues.apache.org/jira/browse/YARN-1479
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 0.23.6, 2.0.4-alpha
Reporter: Kendall Thrapp
Assignee: Chen He
 Fix For: 2.4.0

 Attachments: Yarn-1479.patch


 I've been occasionally coming across instances where Hadoop's Cluster 
 Applications REST API 
 (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API)
  has returned JSON that PHP's json_decode function failed to parse.  I've 
 tracked the syntax error down to the presence of the unquoted word NaN 
 appearing as a value in the JSON.  For example:
 progress:NaN,
 NaN is not part of the JSON spec, so its presence renders the whole JSON 
 string invalid.  Hadoop needs to return something other than NaN in this case 
 -- perhaps an empty string or the quoted string NaN.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package

2014-01-28 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13884421#comment-13884421
 ] 

Jonathan Eagles commented on YARN-1632:
---

Thanks for the patch Chen. Looks like the patch has added a temp file by 
mistake. 

 TestApplicationMasterServices should be under 
 org.apache.hadoop.yarn.server.resourcemanager package
 ---

 Key: YARN-1632
 URL: https://issues.apache.org/jira/browse/YARN-1632
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9, 2.2.0
Reporter: Chen He
Assignee: Chen He
Priority: Minor
 Attachments: yarn-1632.patch, yarn-1632v2.patch


 ApplicationMasterService is under 
 org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test 
 file TestApplicationMasterService is placed under 
 org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice 
 package which only contains one file (TestApplicationMasterService). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package

2014-01-29 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885506#comment-13885506
 ] 

Jonathan Eagles commented on YARN-1632:
---

+1. Simple fix. Thanks, Chen.

 TestApplicationMasterServices should be under 
 org.apache.hadoop.yarn.server.resourcemanager package
 ---

 Key: YARN-1632
 URL: https://issues.apache.org/jira/browse/YARN-1632
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9, 2.2.0
Reporter: Chen He
Assignee: Chen He
Priority: Minor
 Attachments: yarn-1632v2.patch


 ApplicationMasterService is under 
 org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test 
 file TestApplicationMasterService is placed under 
 org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice 
 package which only contains one file (TestApplicationMasterService). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package

2014-01-31 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1632:
--

Fix Version/s: 3.0.0
   2.4.0

 TestApplicationMasterServices should be under 
 org.apache.hadoop.yarn.server.resourcemanager package
 ---

 Key: YARN-1632
 URL: https://issues.apache.org/jira/browse/YARN-1632
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9, 2.2.0
Reporter: Chen He
Assignee: Chen He
Priority: Minor
 Fix For: 3.0.0, 2.4.0

 Attachments: yarn-1632v2.patch


 ApplicationMasterService is under 
 org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test 
 file TestApplicationMasterService is placed under 
 org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice 
 package which only contains one file (TestApplicationMasterService). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1479) Invalid NaN values in Hadoop REST API JSON response

2014-02-19 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905740#comment-13905740
 ] 

Jonathan Eagles commented on YARN-1479:
---

+1. Making a minor tweak to the sleep time since it was causing the test to 
take 1 minute longer than needed on my box.

 Invalid NaN values in Hadoop REST API JSON response
 ---

 Key: YARN-1479
 URL: https://issues.apache.org/jira/browse/YARN-1479
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 0.23.6, 2.0.4-alpha
Reporter: Kendall Thrapp
Assignee: Chen He
 Attachments: Yarn-1479.patch, Yarn-1479v2.patch


 I've been occasionally coming across instances where Hadoop's Cluster 
 Applications REST API 
 (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API)
  has returned JSON that PHP's json_decode function failed to parse.  I've 
 tracked the syntax error down to the presence of the unquoted word NaN 
 appearing as a value in the JSON.  For example:
 progress:NaN,
 NaN is not part of the JSON spec, so its presence renders the whole JSON 
 string invalid.  Hadoop needs to return something other than NaN in this case 
 -- perhaps an empty string or the quoted string NaN.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode

2014-11-07 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-2830:
-

 Summary: Add backwords compatible ContainerId.newInstance 
constructor for use within Tez Local Mode
 Key: YARN-2830
 URL: https://issues.apache.org/jira/browse/YARN-2830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles


YARN-2229 modified the private unstable api for constructing. Tez uses this api 
(shouldn't, but does) for use with Tez Local Mode. This causes a NoSuchMethod 
error when using Tez compiled against pre-2.6. Instead I propose we add the 
backwards compatible api since overflow is not a problem in tez local mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart

2014-11-07 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202511#comment-14202511
 ] 

Jonathan Eagles commented on YARN-2229:
---

FYI: Filed YARN-2830 to help tez deal with this internal api change in yarn.

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.11.patch, YARN-2229.12.patch, 
 YARN-2229.13.patch, YARN-2229.14.patch, YARN-2229.15.patch, 
 YARN-2229.16.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, 
 YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, 
 YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode

2014-11-07 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202516#comment-14202516
 ] 

Jonathan Eagles commented on YARN-2830:
---

Working on validating this patch.

 Add backwords compatible ContainerId.newInstance constructor for use within 
 Tez Local Mode
 --

 Key: YARN-2830
 URL: https://issues.apache.org/jira/browse/YARN-2830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Blocker
 Attachments: YARN-2830-v1.patch


 YARN-2229 modified the private unstable api for constructing. Tez uses this 
 api (shouldn't, but does) for use with Tez Local Mode. This causes a 
 NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose 
 we add the backwards compatible api since overflow is not a problem in tez 
 local mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode

2014-11-07 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2830:
--
Attachment: YARN-2830-v1.patch

 Add backwords compatible ContainerId.newInstance constructor for use within 
 Tez Local Mode
 --

 Key: YARN-2830
 URL: https://issues.apache.org/jira/browse/YARN-2830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Blocker
 Attachments: YARN-2830-v1.patch


 YARN-2229 modified the private unstable api for constructing. Tez uses this 
 api (shouldn't, but does) for use with Tez Local Mode. This causes a 
 NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose 
 we add the backwards compatible api since overflow is not a problem in tez 
 local mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode

2014-11-07 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202593#comment-14202593
 ] 

Jonathan Eagles commented on YARN-2830:
---

[~ozawa], I understand your fix, and that is the correct fix in Tez. But right 
now I am looking at clusters that are running Tez 0.5.1 running on Hadoop 
2.5.1. Those clusters can't be upgraded to Hadoop 2.6.0 without breaking Tez. 
This is purely to maintain backwards compatibility in Hadoop. 

 Add backwords compatible ContainerId.newInstance constructor for use within 
 Tez Local Mode
 --

 Key: YARN-2830
 URL: https://issues.apache.org/jira/browse/YARN-2830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Blocker
 Attachments: YARN-2830-v1.patch


 YARN-2229 modified the private unstable api for constructing. Tez uses this 
 api (shouldn't, but does) for use with Tez Local Mode. This causes a 
 NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose 
 we add the backwards compatible api since overflow is not a problem in tez 
 local mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode

2014-11-07 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2830:
--
Attachment: YARN-2830-v2.patch

v2 patch is validated to work against Tez 0.5.1 existing release compiled 
against Hadoop 2.5.1 running against Hadoop 2.6.0

 Add backwords compatible ContainerId.newInstance constructor for use within 
 Tez Local Mode
 --

 Key: YARN-2830
 URL: https://issues.apache.org/jira/browse/YARN-2830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Blocker
 Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch


 YARN-2229 modified the private unstable api for constructing. Tez uses this 
 api (shouldn't, but does) for use with Tez Local Mode. This causes a 
 NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose 
 we add the backwards compatible api since overflow is not a problem in tez 
 local mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode

2014-11-07 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202624#comment-14202624
 ] 

Jonathan Eagles commented on YARN-2830:
---

[~ozawa], I've validated this patch and added the deprecated flag. Filed 
TEZ-1755 to stop using this deprecated issue.

 Add backwords compatible ContainerId.newInstance constructor for use within 
 Tez Local Mode
 --

 Key: YARN-2830
 URL: https://issues.apache.org/jira/browse/YARN-2830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Blocker
 Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch


 YARN-2229 modified the private unstable api for constructing. Tez uses this 
 api (shouldn't, but does) for use with Tez Local Mode. This causes a 
 NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose 
 we add the backwards compatible api since overflow is not a problem in tez 
 local mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode

2014-11-07 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202644#comment-14202644
 ] 

Jonathan Eagles commented on YARN-2830:
---

[~ozawa], Cross-posting [~hitesh]'s comment from TEZ-1755. When can we expect 
to mark ContainerId.newInstance as public stable to avoid this type of 
incompatibility in the future?

 Add backwords compatible ContainerId.newInstance constructor for use within 
 Tez Local Mode
 --

 Key: YARN-2830
 URL: https://issues.apache.org/jira/browse/YARN-2830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Blocker
 Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch


 YARN-2229 modified the private unstable api for constructing. Tez uses this 
 api (shouldn't, but does) for use with Tez Local Mode. This causes a 
 NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose 
 we add the backwards compatible api since overflow is not a problem in tez 
 local mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode

2014-11-07 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2830:
--
Attachment: YARN-2830-v3.patch

[~sseth], [~ozawa], new patch moves newInstance - newContainerId and re-adds 
the old newInstance. I'm open to other names for the new API. Please review 
when you get a chance.

 Add backwords compatible ContainerId.newInstance constructor for use within 
 Tez Local Mode
 --

 Key: YARN-2830
 URL: https://issues.apache.org/jira/browse/YARN-2830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Blocker
 Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch, 
 YARN-2830-v3.patch


 YARN-2229 modified the private unstable api for constructing. Tez uses this 
 api (shouldn't, but does) for use with Tez Local Mode. This causes a 
 NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose 
 we add the backwards compatible api since overflow is not a problem in tez 
 local mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode

2014-11-08 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2830:
--
Attachment: YARN-2830-v4.patch

 Add backwords compatible ContainerId.newInstance constructor for use within 
 Tez Local Mode
 --

 Key: YARN-2830
 URL: https://issues.apache.org/jira/browse/YARN-2830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Blocker
 Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch, 
 YARN-2830-v3.patch, YARN-2830-v4.patch


 YARN-2229 modified the private unstable api for constructing. Tez uses this 
 api (shouldn't, but does) for use with Tez Local Mode. This causes a 
 NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose 
 we add the backwards compatible api since overflow is not a problem in tez 
 local mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2513) Host framework UIs in YARN for use with the ATS

2014-11-12 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2513:
--
Attachment: YARN-2513-v2.patch

Refreshing the patch.

 Host framework UIs in YARN for use with the ATS
 ---

 Key: YARN-2513
 URL: https://issues.apache.org/jira/browse/YARN-2513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch


 Allow for pluggable UIs as described by TEZ-8. Yarn can provide the 
 infrastructure to host java script and possible java UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-14 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2375:
--
Description: 
This JIRA is to remove the ats enabled flag check within the 
TimelineClientImpl. Example where this fails is below.
While running secure timeline server with ats flag set to disabled on resource 
manager, Timeline delegation token renewer throws an NPE. 

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-14 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212964#comment-14212964
 ] 

Jonathan Eagles commented on YARN-2375:
---

[~zjshen], you misunderstand my request. I am proposing to retain the flag. 
However, the responsibility of checking whether the ats is enabled needs to be 
outside of the TimelineClientImpl. In fact, the code in yarn assumes the design 
I am proposing. In YarnClient it checks the value of ats.enabled, then it 
creates the TimelineClientImpl which then re-checks ats.enabled. This is the 
preferred object design.

The issues lies in the fact the the timeline delegation token renewer creates a 
TimelineClient because it has a timeline server delegation token. This is proof 
enough that a timelineclient needs to be created. This goes back to my original 
design constraint that ats.enabled must be able to be turned off globally, and 
enabled at the per job/framework level.

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-19 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218507#comment-14218507
 ] 

Jonathan Eagles commented on YARN-2375:
---

This code looks good to me. [~zjshen], can you give a final review?

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch


 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-20 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219930#comment-14219930
 ] 

Jonathan Eagles commented on YARN-2375:
---

I think creating a separate ticket for enabling timeline server in the mini MR 
cluster is a good idea. changes look good to me. [~zjshen], any additional 
feedback before this goes in? 

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, 
 YARN-2375.patch


 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-2900:
-

 Summary: Application Not Found in AHS throws Internal Server Error 
with NPE
 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223552#comment-14223552
 ] 

Jonathan Eagles commented on YARN-2900:
---

Application not found in the history store should be a normal case and not an 
exceptional in the REST api case since the application id is user provided 
information.

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2900:
--
Description: 
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
at 
org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
at 
org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
at 
org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
... 59 more

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223621#comment-14223621
 ] 

Jonathan Eagles commented on YARN-2900:
---

[~zjshen], please don't jump to any conclusions. This is my setup, which I 
believe is a supported configuration for 2.6.0.

{quote}
yarn.timeline-service.generic-application-history.enabled=false
yarn.timeline-service.generic-application-history.store-class=org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore
{quote}

The Tez UI make applicationhistory rest api calls to gather fine details for 
those who have it enabled. In my case where generic history is disabled, it is 
causing massive flooding of log files.

As far as not finding the duplicate JIRA, I was unable to find this issue in 
the search. Try to include details that are searchable (stack track, logs, 
class/file names) so that users are able to find the appropriate issue.

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223671#comment-14223671
 ] 

Jonathan Eagles commented on YARN-2900:
---

I do see this in the log file that is suspicious now that I am looking at the 
code. 

2014-11-24 22:12:42,107 [main] WARN 
applicationhistoryservice.ApplicationHistoryServer: The filesystem based 
application history store is deprecated.

Looking into this.

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223676#comment-14223676
 ] 

Jonathan Eagles commented on YARN-2900:
---

Issue is spacing in the config file. Here is the updated stack trace.

{quote}
2014-11-24 22:34:53,900 [17694135@qtp-11347161-6] WARN 
webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
javax.ws.rs.WebApplicationException: 
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: The entity for 
application application_1416586084624_0011 doesn't exist in the timeline store
at 
org.apache.hadoop.yarn.server.webapp.WebServices.rewrapAndThrowException(WebServices.java:452)
at 
org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:227)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSWebServices.getApp(AHSWebServices.java:95)

Caused by: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: The 
entity for application application_1416586084624_0011 doesn't exist in the 
timeline store
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplication(ApplicationHistoryManagerOnTimelineStore.java:542)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplication(ApplicationHistoryManagerOnTimelineStore.java:94)
at 
org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
at 
org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
at 
org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
... 59 more
{quote}

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application Not Found in AHS throws Internal Server Error with NPE

2014-11-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223694#comment-14223694
 ] 

Jonathan Eagles commented on YARN-2900:
---

FYI: Here is the config that was causing the original failure. Notice the 
newline as part of the value.

{quote}
   property
 descriptionStore class name for history store, defaulting to file system 
store/description
 nameyarn.timeline-service.generic-application-history.store-class/name
 
valueorg.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore
 /value
   /property
{quote}

Internal System Error is still happens with 
ApplicationHistoryManagerOnTimelineStore which this issues now tracks.

 Application Not Found in AHS throws Internal Server Error with NPE
 --

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2014-12-04 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234967#comment-14234967
 ] 

Jonathan Eagles commented on YARN-2900:
---

+1. [~zjshen], any last comments before this goes in?

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch, YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2971) RM uses conf instead of service to renew timeline delegation tokens

2014-12-16 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-2971:
-

 Summary: RM uses conf instead of service to renew timeline 
delegation tokens
 Key: YARN-2971
 URL: https://issues.apache.org/jira/browse/YARN-2971
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles


The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to 
renew Timeline DelegationTokens. It should read the service address out of the 
token to renew the delegation token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2971) RM uses conf instead of service address to renew timeline delegation tokens

2014-12-16 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2971:
--
Summary: RM uses conf instead of service address to renew timeline 
delegation tokens  (was: RM uses conf instead of service to renew timeline 
delegation tokens)

 RM uses conf instead of service address to renew timeline delegation tokens
 ---

 Key: YARN-2971
 URL: https://issues.apache.org/jira/browse/YARN-2971
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles

 The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to 
 renew Timeline DelegationTokens. It should read the service address out of 
 the token to renew the delegation token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2971) RM uses conf instead of token service address to renew timeline delegation tokens

2014-12-16 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2971:
--
Summary: RM uses conf instead of token service address to renew timeline 
delegation tokens  (was: RM uses conf instead of service address to renew 
timeline delegation tokens)

 RM uses conf instead of token service address to renew timeline delegation 
 tokens
 -

 Key: YARN-2971
 URL: https://issues.apache.org/jira/browse/YARN-2971
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles

 The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to 
 renew Timeline DelegationTokens. It should read the service address out of 
 the token to renew the delegation token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2971) RM uses conf instead of token service address to renew timeline delegation tokens

2014-12-16 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2971:
--
Attachment: YARN-2971-v1.patch

 RM uses conf instead of token service address to renew timeline delegation 
 tokens
 -

 Key: YARN-2971
 URL: https://issues.apache.org/jira/browse/YARN-2971
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2971-v1.patch


 The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to 
 renew Timeline DelegationTokens. It should read the service address out of 
 the token to renew the delegation token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2971) RM uses conf instead of token service address to renew timeline delegation tokens

2014-12-16 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249461#comment-14249461
 ] 

Jonathan Eagles commented on YARN-2971:
---

findbugs are unrelated to this patch.

 RM uses conf instead of token service address to renew timeline delegation 
 tokens
 -

 Key: YARN-2971
 URL: https://issues.apache.org/jira/browse/YARN-2971
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2971-v1.patch


 The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to 
 renew Timeline DelegationTokens. It should read the service address out of 
 the token to renew the delegation token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4   5   >