[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-21 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-1775:
---

Attachment: YARN-1775-v3.patch

 getSmapBasedCumulativeRssmem() should be private
-Fixed

 When converting #pages to bytes, use PAGE_SIZE instead of hard-coding 1024.
-smap information has KB which needs to be converted to bytes.  PAGE_SIZE 
mostly will be 4096 which will give wrong value in getSmapBasedCumulativeRssmem.

 Move the constant PROCFS_SMAPS_ENABLED to YarnConfiguration
-Fixed.

 Suggestions for renames
 PROCFS_SMAPS_ENABLED - PROCFS_USE_SMAPS_BASED_RSS
 yarn.nodemanager.container-monitor.process-tree.smaps.enabled - 
 yarn.nodemanager.container-monitor.procfs-based-proces-tree.smaps-based-rss.enabled.
  (Did I just say that?  )
-Fixed 
(yarn.nodemanager.container-monitor.procfs-tree.smaps-based-rss.enabled).  
Still long I believe.

 ProcessMemInfo - ProcessTreeSmapMemInfo?, MemoryMappingInfo - 
 ProcessSmapMemoryInfo, moduleMemList - memoryInfoList, processSMAPTree 
 should be cleared in every iteration of updating the process-tree
-Fixed

 isSmapEnabled() should be private
-Removed this method completely. As a part of setConf() call, smapEnabled is 
computed.

 MemoryMappingInfo.updateModuleMemInfo: We should skip everything else when 
 we run into integer parsing issue of the value. Right now you are logging, 
 ignoring and continuing.
-Fixed

Rename MEM_INFO to MemInfo to go with other enums in the source?
-Fixed

We should probably switch the following two ifs?
-Fixed

Javadoc error
-Fixed
Reformatted the testcase as well.

While enforcing memory constraints, I wonder if people would want to use any 
other definitions of RSS to be more conservative or aggressive. Do you think 
it would make sense to provide these options separately, and have what you 
have as the default? We can punt this to a different JIRA, just wanted to 
bring it up.
-This option can be provided as advanced/expert configuration. We can have a 
separate JIRA to track it separately. Please feel free to open a new JIRA.


 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, 
 YARN-1775-v3.patch, YARN-1775-v4.patch, yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-21 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated YARN-1775:
---

Attachment: YARN-1775-v4.patch

Renaming the patch as v4.

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, 
 YARN-1775-v3.patch, YARN-1775-v4.patch, yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942872#comment-13942872
 ] 

Hadoop QA commented on YARN-1775:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635971/YARN-1775-v4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3420//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3420//console

This message is automatically generated.

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, 
 YARN-1775-v3.patch, YARN-1775-v4.patch, yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-21 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1854:
-

Attachment: YARN-1854.1.patch

Attaching patch. Please review.. 

I changed verifyClusterMetrics for retrying 5 times with 1sec waiting.I 
verified behaviour adding break point in capacityscheduler,so that retry is 
done for 2 times and later updating queuemetric references.

 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Attachments: YARN-1854.1.patch, YARN-1854.patch


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942974#comment-13942974
 ] 

Hudson commented on YARN-1811:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #516 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/516/])
YARN-1811. Fixed AMFilters in YARN to correctly accept requests from either 
web-app proxy or the RMs when HA is enabled. Contributed by Robert Kanter. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579877)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RMHAUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilterInitializer.java


 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.4.0

 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, 
 YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1570) Formatting the lines within 80 chars in YarnCommands.apt.vm

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942970#comment-13942970
 ] 

Hudson commented on YARN-1570:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #516 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/516/])
YARN-1570. Fixed formatting of the lines in YarnCommands.apt.vm docs source. 
Contributed by Akira Ajisaka. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579797)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 Formatting the lines within 80 chars in YarnCommands.apt.vm
 ---

 Key: YARN-1570
 URL: https://issues.apache.org/jira/browse/YARN-1570
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.2.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Fix For: 2.4.0

 Attachments: YARN-1570.patch


 In YarnCommands.apt.vm, there are some lines longer than 80 characters.
 For example:
 {code}
   Yarn commands are invoked by the bin/yarn script. Running the yarn script 
 without any arguments prints the description for all commands.
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1859) WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942972#comment-13942972
 ] 

Hudson commented on YARN-1859:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #516 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/516/])
YARN-1859. Fixed WebAppProxyServlet to correctly handle applications absent on 
the ResourceManager. Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579866)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java


 WebAppProxyServlet will throw ApplicationNotFoundException if the app is no 
 longer cached in RM
 ---

 Key: YARN-1859
 URL: https://issues.apache.org/jira/browse/YARN-1859
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-1859.1.patch


 WebAppProxyServlet checks null to determine whether the application is not 
 found or not.
 {code}
  ApplicationReport applicationReport = getApplicationReport(id);
   if(applicationReport == null) {
 LOG.warn(req.getRemoteUser()+ Attempting to access +id+
  that was not found);
 {code}
 However, WebAppProxyServlet calls AppReportFetcher, which consequently calls 
 ClientRMService. When application is not found, ClientRMService throws 
 ApplicationNotFoundException. Therefore, in WebAppProxyServlet, the following 
 logic to create the tracking url for a non-cached app will no longer be in 
 use.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1855) TestRMFailover#testRMWebAppRedirect fails in trunk

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942971#comment-13942971
 ] 

Hudson commented on YARN-1855:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #516 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/516/])
YARN-1855. Made Application-history server to be optional in MiniYARNCluster 
and thus avoid the failure of TestRMFailover#testRMWebAppRedirect. Contributed 
by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579838)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 TestRMFailover#testRMWebAppRedirect fails in trunk
 --

 Key: YARN-1855
 URL: https://issues.apache.org/jira/browse/YARN-1855
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Zhijie Shen
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1855.1.patch, YARN-1855.1.patch, YARN-1855.2.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/514/console :
 {code}
 testRMWebAppRedirect(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.39 sec   ERROR!
 java.lang.NullPointerException: null
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testRMWebAppRedirect(TestRMFailover.java:269)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1570) Formatting the lines within 80 chars in YarnCommands.apt.vm

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943080#comment-13943080
 ] 

Hudson commented on YARN-1570:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1708 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1708/])
YARN-1570. Fixed formatting of the lines in YarnCommands.apt.vm docs source. 
Contributed by Akira Ajisaka. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579797)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 Formatting the lines within 80 chars in YarnCommands.apt.vm
 ---

 Key: YARN-1570
 URL: https://issues.apache.org/jira/browse/YARN-1570
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.2.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Fix For: 2.4.0

 Attachments: YARN-1570.patch


 In YarnCommands.apt.vm, there are some lines longer than 80 characters.
 For example:
 {code}
   Yarn commands are invoked by the bin/yarn script. Running the yarn script 
 without any arguments prints the description for all commands.
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1859) WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943082#comment-13943082
 ] 

Hudson commented on YARN-1859:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1708 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1708/])
YARN-1859. Fixed WebAppProxyServlet to correctly handle applications absent on 
the ResourceManager. Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579866)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java


 WebAppProxyServlet will throw ApplicationNotFoundException if the app is no 
 longer cached in RM
 ---

 Key: YARN-1859
 URL: https://issues.apache.org/jira/browse/YARN-1859
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-1859.1.patch


 WebAppProxyServlet checks null to determine whether the application is not 
 found or not.
 {code}
  ApplicationReport applicationReport = getApplicationReport(id);
   if(applicationReport == null) {
 LOG.warn(req.getRemoteUser()+ Attempting to access +id+
  that was not found);
 {code}
 However, WebAppProxyServlet calls AppReportFetcher, which consequently calls 
 ClientRMService. When application is not found, ClientRMService throws 
 ApplicationNotFoundException. Therefore, in WebAppProxyServlet, the following 
 logic to create the tracking url for a non-cached app will no longer be in 
 use.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943084#comment-13943084
 ] 

Hudson commented on YARN-1811:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1708 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1708/])
YARN-1811. Fixed AMFilters in YARN to correctly accept requests from either 
web-app proxy or the RMs when HA is enabled. Contributed by Robert Kanter. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579877)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RMHAUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilterInitializer.java


 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.4.0

 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, 
 YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1855) TestRMFailover#testRMWebAppRedirect fails in trunk

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943081#comment-13943081
 ] 

Hudson commented on YARN-1855:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1708 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1708/])
YARN-1855. Made Application-history server to be optional in MiniYARNCluster 
and thus avoid the failure of TestRMFailover#testRMWebAppRedirect. Contributed 
by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579838)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 TestRMFailover#testRMWebAppRedirect fails in trunk
 --

 Key: YARN-1855
 URL: https://issues.apache.org/jira/browse/YARN-1855
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Zhijie Shen
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1855.1.patch, YARN-1855.1.patch, YARN-1855.2.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/514/console :
 {code}
 testRMWebAppRedirect(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.39 sec   ERROR!
 java.lang.NullPointerException: null
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testRMWebAppRedirect(TestRMFailover.java:269)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943134#comment-13943134
 ] 

Hudson commented on YARN-1811:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1733 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1733/])
YARN-1811. Fixed AMFilters in YARN to correctly accept requests from either 
web-app proxy or the RMs when HA is enabled. Contributed by Robert Kanter. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579877)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RMHAUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilterInitializer.java


 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.4.0

 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, 
 YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1855) TestRMFailover#testRMWebAppRedirect fails in trunk

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943131#comment-13943131
 ] 

Hudson commented on YARN-1855:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1733 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1733/])
YARN-1855. Made Application-history server to be optional in MiniYARNCluster 
and thus avoid the failure of TestRMFailover#testRMWebAppRedirect. Contributed 
by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579838)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 TestRMFailover#testRMWebAppRedirect fails in trunk
 --

 Key: YARN-1855
 URL: https://issues.apache.org/jira/browse/YARN-1855
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Zhijie Shen
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1855.1.patch, YARN-1855.1.patch, YARN-1855.2.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/514/console :
 {code}
 testRMWebAppRedirect(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.39 sec   ERROR!
 java.lang.NullPointerException: null
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testRMWebAppRedirect(TestRMFailover.java:269)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1570) Formatting the lines within 80 chars in YarnCommands.apt.vm

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943130#comment-13943130
 ] 

Hudson commented on YARN-1570:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1733 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1733/])
YARN-1570. Fixed formatting of the lines in YarnCommands.apt.vm docs source. 
Contributed by Akira Ajisaka. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579797)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 Formatting the lines within 80 chars in YarnCommands.apt.vm
 ---

 Key: YARN-1570
 URL: https://issues.apache.org/jira/browse/YARN-1570
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.2.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Fix For: 2.4.0

 Attachments: YARN-1570.patch


 In YarnCommands.apt.vm, there are some lines longer than 80 characters.
 For example:
 {code}
   Yarn commands are invoked by the bin/yarn script. Running the yarn script 
 without any arguments prints the description for all commands.
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1859) WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943132#comment-13943132
 ] 

Hudson commented on YARN-1859:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1733 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1733/])
YARN-1859. Fixed WebAppProxyServlet to correctly handle applications absent on 
the ResourceManager. Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1579866)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java


 WebAppProxyServlet will throw ApplicationNotFoundException if the app is no 
 longer cached in RM
 ---

 Key: YARN-1859
 URL: https://issues.apache.org/jira/browse/YARN-1859
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-1859.1.patch


 WebAppProxyServlet checks null to determine whether the application is not 
 found or not.
 {code}
  ApplicationReport applicationReport = getApplicationReport(id);
   if(applicationReport == null) {
 LOG.warn(req.getRemoteUser()+ Attempting to access +id+
  that was not found);
 {code}
 However, WebAppProxyServlet calls AppReportFetcher, which consequently calls 
 ClientRMService. When application is not found, ClientRMService throws 
 ApplicationNotFoundException. Therefore, in WebAppProxyServlet, the following 
 logic to create the tracking url for a non-cached app will no longer be in 
 use.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670-v3-b23.patch

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1670:


Attachment: YARN-1670-v3.patch

Thanks [~vinodkv] for the feedback.
1- I changed the formatting.
2- I have modified the patch to use up less memory. It should work now. I have 
also tested the new patch on my Eclipse IDE with HeapSize=1GB and the test pass 
every time I run it.


 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Ted Yu (JIRA)
Ted Yu created YARN-1863:


 Summary: TestRMFailover fails with 'AssertionError: null'
 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu


This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
{code}
testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover)  
Time elapsed: 5.834 sec   FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at 
org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
at 
org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)

testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
elapsed: 5.341 sec   FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at 
org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
at 
org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943218#comment-13943218
 ] 

Hadoop QA commented on YARN-1670:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636040/YARN-1670-v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3422//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3422//console

This message is automatically generated.

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1863:
---

Assignee: Xuan Gong

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong

 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1863:


Attachment: YARN-1863.1.patch

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943253#comment-13943253
 ] 

Xuan Gong commented on YARN-1863:
-

After https://issues.apache.org/jira/browse/YARN-1859, if we send a httpRequest 
with fake Application id, it will throw ApplicationNotFoundException. Instead, 
it will send the httpResponse with Not Found message which cause the test 
case failures

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943260#comment-13943260
 ] 

Xuan Gong commented on YARN-1863:
-

Modify the testcases to verify we can receive httpResponse with Not Found 
message if we send a httpRequest with fakeApplicationId

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-1863:
-

Assignee: Zhijie Shen  (was: Xuan Gong)

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Zhijie Shen
 Attachments: YARN-1863.1.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943267#comment-13943267
 ] 

Zhijie Shen commented on YARN-1863:
---

In YARN-1859, I catch ApplicationNotFoundException and move on, because the 
client still have the chance to create the tracking url when the application is 
not found in RM cache. Therefore, finally if the tracking url is still not 
available, not found http response will be return.

I'll handle the test failure

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Zhijie Shen
 Attachments: YARN-1863.1.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1863:
--

Assignee: Xuan Gong  (was: Zhijie Shen)

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943283#comment-13943283
 ] 

Zhijie Shen commented on YARN-1863:
---

Saw Xuan has post the patch already. Reassign it to Xuan. One comment on the 
patch: please assert the response code == 404 as well.

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943296#comment-13943296
 ] 

Xuan Gong commented on YARN-1863:
-

bq. One comment on the patch: please assert the response code == 404 as well.

DONE

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1863:


Attachment: YARN-1863.2.patch

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1294) Log4j settings in container-log4j.properties cannot be overridden

2014-03-21 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-1294:


Attachment: apache-yarn-1294.1.patch

Updated patch to fix order of assignment so that we can set map and reduce 
specific environment variables and override HADOOP_CLIENT_OPTS and 
HADOOP_ROOT_LOGGER.

 Log4j settings in container-log4j.properties cannot be overridden 
 --

 Key: YARN-1294
 URL: https://issues.apache.org/jira/browse/YARN-1294
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Eugene Koifman
Assignee: Varun Vasudev
 Attachments: apache-yarn-1294.0.patch, apache-yarn-1294.1.patch


 setting HADOOP_ROOT_LOGGER, -Dhadoop.root.logger has no effect



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943356#comment-13943356
 ] 

Vinod Kumar Vavilapalli commented on YARN-1670:
---

Looks good, checking this in.

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1294) Log4j settings in container-log4j.properties cannot be overridden

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943365#comment-13943365
 ] 

Vinod Kumar Vavilapalli commented on YARN-1294:
---

This belongs to MapReduce, moving it.

 Log4j settings in container-log4j.properties cannot be overridden 
 --

 Key: YARN-1294
 URL: https://issues.apache.org/jira/browse/YARN-1294
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Eugene Koifman
Assignee: Varun Vasudev
 Attachments: apache-yarn-1294.0.patch, apache-yarn-1294.1.patch


 setting HADOOP_ROOT_LOGGER, -Dhadoop.root.logger has no effect



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1336) Work-preserving nodemanager restart

2014-03-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943381#comment-13943381
 ] 

Karthik Kambatla commented on YARN-1336:


Thanks for the update, Jason.

I just tried it on a pseudo-dist cluster - on-going containers continue to make 
progress across an NM restart. It looks very neat! I also barely skimmed over 
the rollup patch, things look promising. 

 Work-preserving nodemanager restart
 ---

 Key: YARN-1336
 URL: https://issues.apache.org/jira/browse/YARN-1336
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1336-rollup.patch


 This serves as an umbrella ticket for tasks related to work-preserving 
 nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943389#comment-13943389
 ] 

Hadoop QA commented on YARN-1838:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635874/YARN-1838.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3423//console

This message is automatically generated.

 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi
 Attachments: YARN-1838.1.patch, YARN-1838.2.patch


 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943395#comment-13943395
 ] 

Hudson commented on YARN-1670:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5371 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5371/])
YARN-1670. Fixed a bug in log-aggregation that can cause the writer to write 
more log-data than the log-length that it records. Contributed by Mit Desai. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580005)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java


 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943396#comment-13943396
 ] 

Hadoop QA commented on YARN-1536:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635403/yarn-1536.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3424//console

This message is automatically generated.

 Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the 
 RMContext methods instead
 -

 Key: YARN-1536
 URL: https://issues.apache.org/jira/browse/YARN-1536
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Anubhav Dhoot
Priority: Minor
  Labels: newbie
 Attachments: yarn-1536.002.patch, yarn-1536.patch


 Both ResourceManager and RMContext have methods to access the secret 
 managers, and it should be safe (cleaner) to get rid of the ResourceManager 
 methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943413#comment-13943413
 ] 

Hadoop QA commented on YARN-1863:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636061/YARN-1863.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3425//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3425//console

This message is automatically generated.

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1368) RM should populate running container allocation information from NM resync

2014-03-21 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-1368:
---

Assignee: Anubhav Dhoot

 RM should populate running container allocation information from NM resync
 --

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot

 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943421#comment-13943421
 ] 

Jonathan Eagles commented on YARN-1670:
---

Mit, I'm worried that we are still going to have this issue except in the 
opposite way. On the last read that puts us over the initial filelength, we are 
not going to write the last part of the data that still fits into the original 
filelength. In this case our Aggregate File Log length will be smaller than the 
filelength written to the data structure.

jeagles

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles reopened YARN-1670:
---


 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943422#comment-13943422
 ] 

Jonathan Eagles commented on YARN-1670:
---

I've reopened this ticket to verify the correctness of the patch that went into 
branch-2 and branch-2.4.

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943455#comment-13943455
 ] 

Vinod Kumar Vavilapalli commented on YARN-1863:
---

Zhijie/Xuan, can we please run all the yarn tests before committing this patch? 
Tests have been in the broken stage for a while now..

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1863) Several test failures on trunk

2014-03-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1863:
--

Summary: Several test failures on trunk  (was: TestRMFailover fails with 
'AssertionError: null')

 Several test failures on trunk
 --

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943465#comment-13943465
 ] 

Zhijie Shen commented on YARN-1863:
---

I've observed more test failures on trunk:

Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.288 sec  
FAILURE! - in org.apache.hadoop.yarn.util.TestFSDownload
testDownloadPublicWithStatCache(org.apache.hadoop.yarn.util.TestFSDownload)  
Time elapsed: 0.384 sec   FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at 
org.apache.hadoop.yarn.util.TestFSDownload.testDownloadPublicWithStatCache(TestFSDownload.java:363)

Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 33.744 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
testAMContainerAllocationWhenDNSUnavailable(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation)
  Time elapsed: 5.077 sec   FAILURE!
java.lang.AssertionError: expected:SCHEDULED but was:ALLOCATED
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable(TestContainerAllocation.java:240)

 TestRMFailover fails with 'AssertionError: null'
 

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943464#comment-13943464
 ] 

Thomas Graves commented on YARN-1670:
-

Good catch Jon.  Yep I think you are correct here.  We can actually still write 
more then we should. It should be checking to make sure the curRead + len read 
is  fileLength before writing and if its not only writing the difference.  

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1863) Several test failures on trunk

2014-03-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943469#comment-13943469
 ] 

Xuan Gong commented on YARN-1863:
-

I will fix all of them with this patch

 Several test failures on trunk
 --

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943477#comment-13943477
 ] 

Vinod Kumar Vavilapalli commented on YARN-1670:
---

It'll be useful to write a test for this cast too, though..

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1863) Several YARN test failures on trunk

2014-03-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1863:


Summary: Several YARN test failures on trunk  (was: Several test failures 
on trunk)

 Several YARN test failures on trunk
 ---

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943476#comment-13943476
 ] 

Vinod Kumar Vavilapalli commented on YARN-1670:
---

Good catch, Jon!

Just checking for {{curRead + len  fileLength}} will also not work no? We have 
to explicitly write only {{fileLength - curRead}} bytes if {{curRead + len  
fileLength}}. Right?

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1863) Several test failures on trunk

2014-03-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943471#comment-13943471
 ] 

Xuan Gong commented on YARN-1863:
-

I mean related test case failures

 Several test failures on trunk
 --

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943477#comment-13943477
 ] 

Vinod Kumar Vavilapalli edited comment on YARN-1670 at 3/21/14 8:00 PM:


It'll be useful to write a test for this case too, though..


was (Author: vinodkv):
It'll be useful to write a test for this cast too, though..

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead

2014-03-21 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1536:


Attachment: yarn-1536.003.patch

 Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the 
 RMContext methods instead
 -

 Key: YARN-1536
 URL: https://issues.apache.org/jira/browse/YARN-1536
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Anubhav Dhoot
Priority: Minor
  Labels: newbie
 Attachments: yarn-1536.002.patch, yarn-1536.003.patch, yarn-1536.patch


 Both ResourceManager and RMContext have methods to access the secret 
 managers, and it should be safe (cleaner) to get rid of the ResourceManager 
 methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-03-21 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943497#comment-13943497
 ] 

Mit Desai commented on YARN-1670:
-

Thats correct Vinod. In the last iteration, where the buf length is greater 
than the remaining portion of the file, we will have to write the 
{{fileLength-curRead}} bytes

 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 2.4.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-304) RM Tracking Links for purged applications needs a long-term solution

2014-03-21 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-304:
--

Assignee: Mayank Bansal  (was: Zhijie Shen)

 RM Tracking Links for purged applications needs a long-term solution
 

 Key: YARN-304
 URL: https://issues.apache.org/jira/browse/YARN-304
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.5
Reporter: Derek Dagit
Assignee: Mayank Bansal

 This JIRA is intended to track a proper long-term fix for the issue described 
 in YARN-285.
 The following is from the original description:
 As applications complete, the RM tracks their IDs in a completed list. This 
 list is routinely truncated to limit the total number of application 
 remembered by the RM.
 When a user clicks the History for a job, either the browser is redirected to 
 the application's tracking link obtained from the stored application 
 instance. But when the application has been purged from the RM, an error is 
 displayed.
 In very busy clusters the rate at which applications complete can cause 
 applications to be purged from the RM's internal list within hours, which 
 breaks the proxy URLs users have saved for their jobs.
 We would like the RM to provide valid tracking links persist so that users 
 are not frustrated by broken links.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution

2014-03-21 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943545#comment-13943545
 ] 

Mayank Bansal commented on YARN-304:


Taking it over


 RM Tracking Links for purged applications needs a long-term solution
 

 Key: YARN-304
 URL: https://issues.apache.org/jira/browse/YARN-304
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.5
Reporter: Derek Dagit
Assignee: Mayank Bansal

 This JIRA is intended to track a proper long-term fix for the issue described 
 in YARN-285.
 The following is from the original description:
 As applications complete, the RM tracks their IDs in a completed list. This 
 list is routinely truncated to limit the total number of application 
 remembered by the RM.
 When a user clicks the History for a job, either the browser is redirected to 
 the application's tracking link obtained from the stored application 
 instance. But when the application has been purged from the RM, an error is 
 displayed.
 In very busy clusters the rate at which applications complete can cause 
 applications to be purged from the RM's internal list within hours, which 
 breaks the proxy URLs users have saved for their jobs.
 We would like the RM to provide valid tracking links persist so that users 
 are not frustrated by broken links.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943542#comment-13943542
 ] 

Hadoop QA commented on YARN-1536:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636094/yarn-1536.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3426//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3426//console

This message is automatically generated.

 Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the 
 RMContext methods instead
 -

 Key: YARN-1536
 URL: https://issues.apache.org/jira/browse/YARN-1536
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Anubhav Dhoot
Priority: Minor
  Labels: newbie
 Attachments: yarn-1536.002.patch, yarn-1536.003.patch, yarn-1536.patch


 Both ResourceManager and RMContext have methods to access the secret 
 managers, and it should be safe (cleaner) to get rid of the ResourceManager 
 methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1863) Several YARN test failures on trunk

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943572#comment-13943572
 ] 

Vinod Kumar Vavilapalli commented on YARN-1863:
---

I can help fix these tests. Let's cover everything that is either related or 
not.

 Several YARN test failures on trunk
 ---

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1776) renewDelegationToken should survive RM failover

2014-03-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943585#comment-13943585
 ] 

Jian He commented on YARN-1776:
---

Some comments on the patch:
- completeRenewRecords - checkAndResumeUpdateOperation?
- 
updateRMDelegationTokenAndSequenceNumberState-updateRMDelegationTokenAndSequenceNumberInternal
 ?
- For ZK, we may just use setData, instead of remove and create znode for 
updates.
- Test for FSRMStateStore: we need a test to verify on recovery, if 
encountering  a .new file, we should resume the update operation. Essentially, 
completeRenewRecords needs test. 

 renewDelegationToken should survive RM failover
 ---

 Key: YARN-1776
 URL: https://issues.apache.org/jira/browse/YARN-1776
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1776.1.patch, YARN-1776.2.patch, YARN-1776.3.patch


 When a delegation token is renewed, two RMStateStore operations: 1) removing 
 the old DT, and 2) storing the new DT will happen. If RM fails in between. 
 There would be problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM

2014-03-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943589#comment-13943589
 ] 

Jian He commented on YARN-1849:
---

LGTM ,  thanks Karthik !

 NPE in ResourceTrackerService#registerNodeManager for UAM
 -

 Key: YARN-1849
 URL: https://issues.apache.org/jira/browse/YARN-1849
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, 
 yarn-1849-3.patch, yarn-1849-4.patch, yarn-1849-5.patch, yarn-1849-6.patch


 While running an UnmanagedAM on secure cluster, ran into an NPE on 
 failover/restart. This is similar to YARN-1821. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id

2014-03-21 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1838:
-

Attachment: YARN-1838.3.patch

New patch hopefully fixing compilation issue and fixing bug in how insert 
timestamp is determined.

 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi
 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch


 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1775:
--

Attachment: YARN-1775-v5.patch

Same patch with a few more renames. Will check this in if Jenkins says okay.

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, 
 YARN-1775-v3.patch, YARN-1775-v4.patch, YARN-1775-v5.patch, 
 yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943658#comment-13943658
 ] 

Hadoop QA commented on YARN-1838:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636118/YARN-1838.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1494 javac 
compiler warnings (more than the trunk's current 1491 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3427//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3427//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3427//console

This message is automatically generated.

 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi
 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch


 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1863) Several YARN test failures on trunk

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1863:
--

Priority: Blocker  (was: Major)
Target Version/s: 2.4.0

 Several YARN test failures on trunk
 ---

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1863) Several YARN test failures on trunk

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-1863:
-

Assignee: Vinod Kumar Vavilapalli  (was: Xuan Gong)

 Several YARN test failures on trunk
 ---

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id

2014-03-21 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1838:
-

Attachment: YARN-1838.4.patch

Fixed javac warning.

 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi
 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, 
 YARN-1838.4.patch


 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-03-21 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1577:
--

Attachment: YARN-1577.1.patch

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Naren Koneru
Priority: Blocker
 Attachments: YARN-1577.1.patch


 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-03-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943671#comment-13943671
 ] 

Jian He commented on YARN-1577:
---

Uploaded a patch:
Changed UMA launcher to wait until attempt reaches Launched state to launch the 
AM.

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Naren Koneru
Priority: Blocker
 Attachments: YARN-1577.1.patch


 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943679#comment-13943679
 ] 

Vinod Kumar Vavilapalli commented on YARN-1854:
---

This looks fine, running all the tests before commit so as to take care of 
YARN-1863 also.

 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Attachments: YARN-1854.1.patch, YARN-1854.patch


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-03-21 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1577:
--

Attachment: YARN-1577.2.patch

Fixed a typo

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Naren Koneru
Priority: Blocker
 Attachments: YARN-1577.1.patch, YARN-1577.2.patch


 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943690#comment-13943690
 ] 

Karthik Kambatla commented on YARN-1775:


The latest patch looks good to me too. 

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, 
 YARN-1775-v3.patch, YARN-1775-v4.patch, YARN-1775-v5.patch, 
 yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

2014-03-21 Thread Ashwin Shankar (JIRA)
Ashwin Shankar created YARN-1864:


 Summary: Fair Scheduler Dynamic Hierarchical User Queues
 Key: YARN-1864
 URL: https://issues.apache.org/jira/browse/YARN-1864
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Ashwin Shankar
 Fix For: 2.4.0


In Fair Scheduler, we want to be able to create user queues under any parent 
queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
called root.allUserQueues, we want be able to create a new queue called 
root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
by this user to root.allUserQueues will be run in this newly created 
root.allUserQueues.user1.
This is very similar to the 'user-as-default' feature in Fair Scheduler which 
creates user queues under root queue. But we want the ability to create user 
queues under ANY parent queue.

Why do we want this ?
1. Preemption : these dynamically created user queues can preempt each other if 
its fair share is not met. So there is fairness among users.
2. Allocation to user queues : we want all the user queries(adhoc) to consume 
only a fraction of resources in the shared cluster. By creating this feature,we 
could do that by giving a fair share to the parent user queue which is then 
redistributed to all the dynamically created user queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

2014-03-21 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-1864:
-

Attachment: YARN-1864-v1.txt

 Fair Scheduler Dynamic Hierarchical User Queues
 ---

 Key: YARN-1864
 URL: https://issues.apache.org/jira/browse/YARN-1864
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Ashwin Shankar
  Labels: scheduler
 Fix For: 2.4.0

 Attachments: YARN-1864-v1.txt


 In Fair Scheduler, we want to be able to create user queues under any parent 
 queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
 called root.allUserQueues, we want be able to create a new queue called 
 root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
 by this user to root.allUserQueues will be run in this newly created 
 root.allUserQueues.user1.
 This is very similar to the 'user-as-default' feature in Fair Scheduler which 
 creates user queues under root queue. But we want the ability to create user 
 queues under ANY parent queue.
 Why do we want this ?
 1. Preemption : these dynamically created user queues can preempt each other 
 if its fair share is not met. So there is fairness among users.
 2. Allocation to user queues : we want all the user queries(adhoc) to consume 
 only a fraction of resources in the shared cluster. By creating this 
 feature,we could do that by giving a fair share to the parent user queue 
 which is then redistributed to all the dynamically created user queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1776) renewDelegationToken should survive RM failover

2014-03-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1776:
--

Attachment: YARN-1776.4.patch

Upload a new patch, which addresses Jian's comments

 renewDelegationToken should survive RM failover
 ---

 Key: YARN-1776
 URL: https://issues.apache.org/jira/browse/YARN-1776
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1776.1.patch, YARN-1776.2.patch, YARN-1776.3.patch, 
 YARN-1776.4.patch


 When a delegation token is renewed, two RMStateStore operations: 1) removing 
 the old DT, and 2) storing the new DT will happen. If RM fails in between. 
 There would be problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943700#comment-13943700
 ] 

Hadoop QA commented on YARN-1775:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636125/YARN-1775-v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3428//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3428//console

This message is automatically generated.

 Create SMAPBasedProcessTree to get PSS information
 --

 Key: YARN-1775
 URL: https://issues.apache.org/jira/browse/YARN-1775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, 
 YARN-1775-v3.patch, YARN-1775-v4.patch, YARN-1775-v5.patch, 
 yarn-1775-2.4.0.patch


 Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will 
 make use of PSS for computing the memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943706#comment-13943706
 ] 

Hudson commented on YARN-1849:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5375 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5375/])
YARN-1849. Fixed NPE in ResourceTrackerService#registerNodeManager for UAM. 
Contributed by Karthik Kambatla (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580077)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 NPE in ResourceTrackerService#registerNodeManager for UAM
 -

 Key: YARN-1849
 URL: https://issues.apache.org/jira/browse/YARN-1849
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, 
 yarn-1849-3.patch, yarn-1849-4.patch, yarn-1849-5.patch, yarn-1849-6.patch


 While running an UnmanagedAM on secure cluster, ran into an NPE on 
 failover/restart. This is similar to YARN-1821. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943710#comment-13943710
 ] 

Hadoop QA commented on YARN-1838:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636127/YARN-1838.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3429//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3429//console

This message is automatically generated.

 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi
 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, 
 YARN-1838.4.patch


 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943743#comment-13943743
 ] 

Hadoop QA commented on YARN-1577:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636130/YARN-1577.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3430//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3430//console

This message is automatically generated.

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Naren Koneru
Priority: Blocker
 Attachments: YARN-1577.1.patch, YARN-1577.2.patch


 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id

2014-03-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1838:
--

Attachment: YARN-1838.5.patch

 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi
 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, 
 YARN-1838.4.patch, YARN-1838.5.patch


 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id

2014-03-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943745#comment-13943745
 ] 

Zhijie Shen commented on YARN-1838:
---

Billie, thanks for the new patch. It looks good to me. Based on your patch, I 
just made some minor touch: remove unnecessary suppresswarning, format a piece 
of javadoc, and enhance the test of testGetEntitiesWithFromTs

Vinod, do you want to have a look as well?

 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi
 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, 
 YARN-1838.4.patch, YARN-1838.5.patch


 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id

2014-03-21 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943759#comment-13943759
 ] 

Billie Rinaldi commented on YARN-1838:
--

Thanks, [~zjshen].  Your updates in the latest patch look good to me.

 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi
 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, 
 YARN-1838.4.patch, YARN-1838.5.patch


 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943775#comment-13943775
 ] 

Hadoop QA commented on YARN-1864:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636137/YARN-1864-v1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3432//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3432//console

This message is automatically generated.

 Fair Scheduler Dynamic Hierarchical User Queues
 ---

 Key: YARN-1864
 URL: https://issues.apache.org/jira/browse/YARN-1864
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Ashwin Shankar
  Labels: scheduler
 Fix For: 2.4.0

 Attachments: YARN-1864-v1.txt


 In Fair Scheduler, we want to be able to create user queues under any parent 
 queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
 called root.allUserQueues, we want be able to create a new queue called 
 root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
 by this user to root.allUserQueues will be run in this newly created 
 root.allUserQueues.user1.
 This is very similar to the 'user-as-default' feature in Fair Scheduler which 
 creates user queues under root queue. But we want the ability to create user 
 queues under ANY parent queue.
 Why do we want this ?
 1. Preemption : these dynamically created user queues can preempt each other 
 if its fair share is not met. So there is fairness among users.
 2. Allocation to user queues : we want all the user queries(adhoc) to consume 
 only a fraction of resources in the shared cluster. By creating this 
 feature,we could do that by giving a fair share to the parent user queue 
 which is then redistributed to all the dynamically created user queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1776) renewDelegationToken should survive RM failover

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943773#comment-13943773
 ] 

Hadoop QA commented on YARN-1776:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636139/YARN-1776.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3431//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3431//console

This message is automatically generated.

 renewDelegationToken should survive RM failover
 ---

 Key: YARN-1776
 URL: https://issues.apache.org/jira/browse/YARN-1776
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1776.1.patch, YARN-1776.2.patch, YARN-1776.3.patch, 
 YARN-1776.4.patch


 When a delegation token is renewed, two RMStateStore operations: 1) removing 
 the old DT, and 2) storing the new DT will happen. If RM fails in between. 
 There would be problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1863) TestRMFailover fails with 'AssertionError: null'

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1863:
--

Assignee: Xuan Gong  (was: Vinod Kumar Vavilapalli)
 Summary: TestRMFailover fails with 'AssertionError: null'   (was: Several 
YARN test failures on trunk)

Okay, I cannot reproduce any more test failures on linux and Mac. Re-editing 
the title and assigning back to Xuan.

Checking this in for now.

 TestRMFailover fails with 'AssertionError: null' 
 -

 Key: YARN-1863
 URL: https://issues.apache.org/jira/browse/YARN-1863
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1863.1.patch, YARN-1863.2.patch


 This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
 {code}
 testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) 
  Time elapsed: 5.834 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
 testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover)  Time 
 elapsed: 5.341 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
   at 
 org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-03-21 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-1372:
---

Assignee: Anubhav Dhoot

 Ensure all completed containers are reported to the AMs across RM restart
 -

 Key: YARN-1372
 URL: https://issues.apache.org/jira/browse/YARN-1372
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot

 Currently the NM informs the RM about completed containers and then removes 
 those containers from the RM notification list. The RM passes on that 
 completed container information to the AM and the AM pulls this data. If the 
 RM dies before the AM pulls this data then the AM may not be able to get this 
 information again. To fix this, NM should maintain a separate list of such 
 completed container notifications sent to the RM. After the AM has pulled the 
 containers from the RM then the RM will inform the NM about it and the NM can 
 remove the completed container from the new list. Upon re-register with the 
 RM (after RM restart) the NM should send the entire list of completed 
 containers to the RM along with any other containers that completed while the 
 RM was dead. This ensures that the RM can inform the AM's about all completed 
 containers. Some container completions may be reported more than once since 
 the AM may have pulled the container but the RM may die before notifying the 
 NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-03-21 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-1365:
---

Assignee: Anubhav Dhoot

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot

 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1373) Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps

2014-03-21 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-1373:
---

Assignee: Anubhav Dhoot

 Transition RMApp and RMAppAttempt state to RUNNING after restart for 
 recovered running apps
 ---

 Key: YARN-1373
 URL: https://issues.apache.org/jira/browse/YARN-1373
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot

 Currently the RM moves recovered app attempts to the a terminal recovered 
 state and starts a new attempt. Instead, it will have to transition the last 
 attempt to a running state such that it can proceed as normal once the 
 running attempt has resynced with the ApplicationMasterService (YARN-1365 and 
 YARN-1366). If the RM had started the application container before dying then 
 the AM would be up and trying to contact the RM. The RM may have had died 
 before launching the container. For this case, the RM should wait for AM 
 liveliness period and issue a kill container for the stored master container. 
 It should transition this attempt to some RECOVER_ERROR state and proceed to 
 start a new attempt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1823) Recover Unmanaged AMs

2014-03-21 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-1823:
---

Assignee: Anubhav Dhoot

 Recover Unmanaged AMs
 -

 Key: YARN-1823
 URL: https://issues.apache.org/jira/browse/YARN-1823
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Anubhav Dhoot

 YARN-1815 does not recover unmanaged AMs after RM restart. This JIRA is a 
 place holder to remove that and make any other necessary changes to ensure 
 Unmanaged AMs continue to proceed after restart. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1369) Capacity scheduler to re-populate container allocation state

2014-03-21 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-1369:
---

Assignee: Anubhav Dhoot

 Capacity scheduler to re-populate container allocation state
 

 Key: YARN-1369
 URL: https://issues.apache.org/jira/browse/YARN-1369
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot

 YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
 containers and the RM will pass this information to the schedulers along with 
 the node information. The schedulers are currently already informed about 
 previously running apps when the app data is recovered from the store. The 
 scheduler is expected to be able to repopulate its allocation state from the 
 above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1371) FIFO scheduler to re-populate container allocation state

2014-03-21 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-1371:
---

Assignee: Anubhav Dhoot

 FIFO scheduler to re-populate container allocation state
 

 Key: YARN-1371
 URL: https://issues.apache.org/jira/browse/YARN-1371
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot

 YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
 containers and the RM will pass this information to the schedulers along with 
 the node information. The schedulers are currently already informed about 
 previously running apps when the app data is recovered from the store. The 
 scheduler is expected to be able to repopulate its allocation state from the 
 above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943784#comment-13943784
 ] 

Vinod Kumar Vavilapalli commented on YARN-1854:
---

+1, YARN-1863 is in. Checking this in.

 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Attachments: YARN-1854.1.patch, YARN-1854.patch


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1776) renewDelegationToken should survive RM failover

2014-03-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1776:
--

Attachment: YARN-1776.5.patch

Upload a new patch to clean code path

 renewDelegationToken should survive RM failover
 ---

 Key: YARN-1776
 URL: https://issues.apache.org/jira/browse/YARN-1776
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1776.1.patch, YARN-1776.2.patch, YARN-1776.3.patch, 
 YARN-1776.4.patch, YARN-1776.5.patch


 When a delegation token is renewed, two RMStateStore operations: 1) removing 
 the old DT, and 2) storing the new DT will happen. If RM fails in between. 
 There would be problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

2014-03-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943789#comment-13943789
 ] 

Karthik Kambatla commented on YARN-556:
---

Thanks for posting the design doc, [~bikassaha]. [~adhoot] and I have been 
working on this for the past few days towards an initial prototype, so we get a 
handle on all the items required.

In terms of actual work-items (JIRAs), I wonder if it makes sense to work in a 
branch. Making the AM, NM resync changes without the scheduler changes would 
break things. We can work on the scheduler changes first, so there is no caller 
and add resync later, but I suppose that would make it hard to test outside of 
unit tests.

Thoughts? 

 RM Restart phase 2 - Work preserving restart
 

 Key: YARN-556
 URL: https://issues.apache.org/jira/browse/YARN-556
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: Work Preserving RM Restart.pdf


 YARN-128 covered storing the state needed for the RM to recover critical 
 information. This umbrella jira will track changes needed to recover the 
 running state of the cluster so that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation

2014-03-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943793#comment-13943793
 ] 

Jian He commented on YARN-1521:
---

- getNewApplicationId , getDelegationToken. since each call returns a new 
ID/Token,  not sure this matches with idempotency.
- For the registers protocols. For example, registerNodeManager : if previous 
call succeeds, RM didn't crash, registerNodeManager retry because of some 
network problem, the next call comes in, this node is deemed as a reconnected 
node instead of a new node.  Probably AtMostOnce?

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong

 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-03-21 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-1577:
-

Assignee: Jian He  (was: Naren Koneru)

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-1577.1.patch, YARN-1577.2.patch


 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

2014-03-21 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-1864:
-

Description: 
In Fair Scheduler, we want to be able to create user queues under any parent 
queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
called root.allUserQueues, we want be able to create a new queue called 
root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
by this user to root.allUserQueues will be run in this newly created 
root.allUserQueues.user1.
This is very similar to the 'user-as-default' feature in Fair Scheduler which 
creates user queues under root queue. But we want the ability to create user 
queues under ANY parent queue.

Why do we want this ?
1. Preemption : these dynamically created user queues can preempt each other if 
its fair share is not met. So there is fairness among users.
User queues can also preempt other non-user leaf queue as well if below fair 
share.
2. Allocation to user queues : we want all the user queries(adhoc) to consume 
only a fraction of resources in the shared cluster. By creating this feature,we 
could do that by giving a fair share to the parent user queue which is then 
redistributed to all the dynamically created user queues.

  was:
In Fair Scheduler, we want to be able to create user queues under any parent 
queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
called root.allUserQueues, we want be able to create a new queue called 
root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
by this user to root.allUserQueues will be run in this newly created 
root.allUserQueues.user1.
This is very similar to the 'user-as-default' feature in Fair Scheduler which 
creates user queues under root queue. But we want the ability to create user 
queues under ANY parent queue.

Why do we want this ?
1. Preemption : these dynamically created user queues can preempt each other if 
its fair share is not met. So there is fairness among users.
2. Allocation to user queues : we want all the user queries(adhoc) to consume 
only a fraction of resources in the shared cluster. By creating this feature,we 
could do that by giving a fair share to the parent user queue which is then 
redistributed to all the dynamically created user queues.


 Fair Scheduler Dynamic Hierarchical User Queues
 ---

 Key: YARN-1864
 URL: https://issues.apache.org/jira/browse/YARN-1864
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Ashwin Shankar
  Labels: scheduler
 Fix For: 2.4.0

 Attachments: YARN-1864-v1.txt


 In Fair Scheduler, we want to be able to create user queues under any parent 
 queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
 called root.allUserQueues, we want be able to create a new queue called 
 root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
 by this user to root.allUserQueues will be run in this newly created 
 root.allUserQueues.user1.
 This is very similar to the 'user-as-default' feature in Fair Scheduler which 
 creates user queues under root queue. But we want the ability to create user 
 queues under ANY parent queue.
 Why do we want this ?
 1. Preemption : these dynamically created user queues can preempt each other 
 if its fair share is not met. So there is fairness among users.
 User queues can also preempt other non-user leaf queue as well if below fair 
 share.
 2. Allocation to user queues : we want all the user queries(adhoc) to consume 
 only a fraction of resources in the shared cluster. By creating this 
 feature,we could do that by giving a fair share to the parent user queue 
 which is then redistributed to all the dynamically created user queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943800#comment-13943800
 ] 

Vinod Kumar Vavilapalli commented on YARN-1577:
---

Quickly scanned the patch. It looks like an existing bug, but it looks like 
even if the app fails immediately for some reason, client will still be stuck 
for 10min wait timeout.

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-1577.1.patch, YARN-1577.2.patch


 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943800#comment-13943800
 ] 

Vinod Kumar Vavilapalli edited comment on YARN-1577 at 3/22/14 12:52 AM:
-

Quickly scanned the patch. It looks like an existing bug, but this patch may 
worsen it - even if the app fails immediately for some reason, client will 
still be stuck for the 10min wait timeout.


was (Author: vinodkv):
Quickly scanned the patch. It looks like an existing bug, but it looks like 
even if the app fails immediately for some reason, client will still be stuck 
for 10min wait timeout.

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-1577.1.patch, YARN-1577.2.patch


 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943806#comment-13943806
 ] 

Hudson commented on YARN-1854:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5377 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5377/])
YARN-1854. Fixed test failure in TestRMHA#testStartAndTransitions. Contributed 
by Rohith Sharma KS. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580097)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1854.1.patch, YARN-1854.patch


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1859) WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM

2014-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943803#comment-13943803
 ] 

Hudson commented on YARN-1859:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5377 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5377/])
YARN-1863. Fixed test failure in TestRMFailover after YARN-1859. Contributed by 
Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580094)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java


 WebAppProxyServlet will throw ApplicationNotFoundException if the app is no 
 longer cached in RM
 ---

 Key: YARN-1859
 URL: https://issues.apache.org/jira/browse/YARN-1859
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-1859.1.patch


 WebAppProxyServlet checks null to determine whether the application is not 
 found or not.
 {code}
  ApplicationReport applicationReport = getApplicationReport(id);
   if(applicationReport == null) {
 LOG.warn(req.getRemoteUser()+ Attempting to access +id+
  that was not found);
 {code}
 However, WebAppProxyServlet calls AppReportFetcher, which consequently calls 
 ClientRMService. When application is not found, ClientRMService throws 
 ApplicationNotFoundException. Therefore, in WebAppProxyServlet, the following 
 logic to create the tracking url for a non-cached app will no longer be in 
 use.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >