date:20140129


[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885133#comment-13885133
 ] 

Karthik Kambatla commented on YARN-1618:


Thanks Bikas. Yes, verified the latest patch also on a secure cluster and ran 
Oozie workflows against it. The RM doesn't crash anymore when the workflow is 
supplied the Standby RM.

 Applications transition from NEW to FINAL_SAVING, and try to update 
 non-existing entries in the state-store
 ---

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1578) Fix how to handle ApplicationHistory about the container

2014-01-29 Thread Shinichi Yamashita (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichi Yamashita updated YARN-1578:
-

Attachment: application_1390978867235_0001
resoucemanager.log

Thank you for your comment.

I confirmed that this problem occurred in trunk which built today. I attached 
the ResourceManager log (resourcemanager.log).
finish data of container_1390978867235_0001_01_28 did not seem to be 
recorded in ResourceManager log.
And the finish information of this container is not output for the history file 
(attached application_1390978867235_0001).

By the current implementaion, FileSystemApplicationHistorySever generates only 
startData at the point that your comment.
And it becomes NullPointerException in the following code because the 
finishData is null.

{code}
  private static void mergeContainerHistoryData(
  ContainerHistoryData historyData, ContainerFinishData finishData) {
historyData.setFinishTime(finishData.getFinishTime());
historyData.setDiagnosticsInfo(finishData.getDiagnosticsInfo());
historyData.setLogURL(finishData.getLogURL());
historyData.setContainerExitStatus(finishData
.getContainerExitStatus());
historyData.setContainerState(finishData.getContainerState());
  }
{code}


 Fix how to handle ApplicationHistory about the container
 

 Key: YARN-1578
 URL: https://issues.apache.org/jira/browse/YARN-1578
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-321
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: YARN-1578.patch, application_1390978867235_0001, 
 resoucemanager.log, screenshot.png


 I carried out PiEstimator job at Hadoop cluster which applied YARN-321.
 After the job end and when I accessed Web UI of HistoryServer, it displayed 
 500. And HistoryServer daemon log was output as follows.
 {code}
 2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: 
 /applicationhistory/appattempt/appattempt_1389146249925_0008_01
 java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 (snip...)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201)
 at 
 org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110)
 (snip...)
 {code}
 I confirmed that there was container which was not finished from 
 ApplicationHistory file.
 In ResourceManager daemon log, ResourceManager reserved this container, but 
 did not allocate it.
 Therefore, about a container which is not allocated, it is necessary to 
 change how to handle in ApplicationHistory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-925) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications

2014-01-29 Thread Shinichi Yamashita (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885152#comment-13885152
 ] 

Shinichi Yamashita commented on YARN-925:
-

Thank you for the check of the patch.
As you say, the current patch is not good for a histroy of huge application.
I thought about a plan to add filter information to a history file name and a 
plan to add find command to HDFS.
However, I thought that your idea was simple and better than mine.

I think about a better method.

 Augment HistoryStorage Reader Interface to Support Filters When Getting 
 Applications
 

 Key: YARN-925
 URL: https://issues.apache.org/jira/browse/YARN-925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Shinichi Yamashita
 Fix For: YARN-321

 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, 
 YARN-925-4.patch, YARN-925-5.patch, YARN-925-6.patch, YARN-925-7.patch, 
 YARN-925-8.patch


 We need to allow filter parameters for getApplications, pushing filtering to 
 the implementations of the interface. The implementations should know the 
 best about optimizing filtering. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1631) Container allocation issue in Leafqueue assignContainers()

2014-01-29 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1631:
--

Attachment: Yarn-1631.2.patch

Updated with test case to reproduce this scenario.
Please review the same.

 Container allocation issue in Leafqueue assignContainers()
 --

 Key: YARN-1631
 URL: https://issues.apache.org/jira/browse/YARN-1631
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: SuSe 11 Linux 
Reporter: Sunil G
 Attachments: Yarn-1631.1.patch, Yarn-1631.2.patch


 Application1 has a demand of 8GB[Map Task Size as 8GB] which is more than 
 Node_1 can handle.
 Node_1 has a size of 8GB and 2GB is used by Application1's AM.
 Hence reservation happened for remaining 6GB in Node_1 by Application1.
 A new job is submitted with 2GB AM size and 2GB task size with only 2 Maps to 
 run.
 Node_2 also has 8GB capability.
 But Application2's AM cannot be launched in Node_2. And Application2 waits 
 longer as only 2 Nodes are available in cluster.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1630) Introduce timeout for async polling operations in YarnClientImpl


[ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885238#comment-13885238
 ] 

Hudson commented on YARN-1630:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #465 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/465/])
YARN-1630. Introduce timeout for async polling operations in YarnClientImpl 
(Aditya Acharya via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562289)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


 Introduce timeout for async polling operations in YarnClientImpl
 

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Fix For: 2.3.0

 Attachments: diff-1.txt, diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-29 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885252#comment-13885252
 ] 

Jian He commented on YARN-1618:
---

patch looks good to me, +1

 Applications transition from NEW to FINAL_SAVING, and try to update 
 non-existing entries in the state-store
 ---

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-29 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885258#comment-13885258
 ] 

Jian He commented on YARN-1618:
---

I found New state on Recover event is possible to transition to FINAL_SAVING 
state. This should not happen. FINAL_SAVING state should be removed. we may 
just fix it here ?

 Applications transition from NEW to FINAL_SAVING, and try to update 
 non-existing entries in the state-store
 ---

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1631) Container allocation issue in Leafqueue assignContainers()


[ 
https://issues.apache.org/jira/browse/YARN-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885267#comment-13885267
 ] 

Hadoop QA commented on YARN-1631:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625843/Yarn-1631.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2957//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2957//console

This message is automatically generated.

 Container allocation issue in Leafqueue assignContainers()
 --

 Key: YARN-1631
 URL: https://issues.apache.org/jira/browse/YARN-1631
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: SuSe 11 Linux 
Reporter: Sunil G
 Attachments: Yarn-1631.1.patch, Yarn-1631.2.patch


 Application1 has a demand of 8GB[Map Task Size as 8GB] which is more than 
 Node_1 can handle.
 Node_1 has a size of 8GB and 2GB is used by Application1's AM.
 Hence reservation happened for remaining 6GB in Node_1 by Application1.
 A new job is submitted with 2GB AM size and 2GB task size with only 2 Maps to 
 run.
 Node_2 also has 8GB capability.
 But Application2's AM cannot be launched in Node_2. And Application2 waits 
 longer as only 2 Nodes are available in cluster.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1630) Introduce timeout for async polling operations in YarnClientImpl


[ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885329#comment-13885329
 ] 

Hudson commented on YARN-1630:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1682 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1682/])
YARN-1630. Introduce timeout for async polling operations in YarnClientImpl 
(Aditya Acharya via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562289)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


 Introduce timeout for async polling operations in YarnClientImpl
 

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Fix For: 2.3.0

 Attachments: diff-1.txt, diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1630) Introduce timeout for async polling operations in YarnClientImpl


[ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885334#comment-13885334
 ] 

Hudson commented on YARN-1630:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1657 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1657/])
YARN-1630. Introduce timeout for async polling operations in YarnClientImpl 
(Aditya Acharya via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562289)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


 Introduce timeout for async polling operations in YarnClientImpl
 

 Key: YARN-1630
 URL: https://issues.apache.org/jira/browse/YARN-1630
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Aditya Acharya
Assignee: Aditya Acharya
 Fix For: 2.3.0

 Attachments: diff-1.txt, diff.txt


 I ran an MR2 application that would have been long running, and killed it 
 programmatically using a YarnClient. The app was killed, but the client hung 
 forever. The message that I saw, which spammed the logs, was Watiting for 
 application application_1389036507624_0018 to be killed.
 The RM log indicated that the app had indeed transitioned from RUNNING to 
 KILLED, but for some reason future responses to the RPC to kill the 
 application did not indicate that the app had been terminated.
 I tracked this down to YarnClientImpl.java, and though I was unable to 
 reproduce the bug, I wrote a patch to introduce a bound on the number of 
 times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1600) RM does not startup when security is enabled without spnego configured


[ 
https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885429#comment-13885429
 ] 

Hudson commented on YARN-1600:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5058 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5058/])
YARN-1600. RM does not startup when security is enabled without spnego 
configured. Contributed by Haohui Mai (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562482)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApps.java


 RM does not startup when security is enabled without spnego configured
 --

 Key: YARN-1600
 URL: https://issues.apache.org/jira/browse/YARN-1600
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Haohui Mai
Priority: Blocker
 Attachments: YARN-1600.000.patch


 We have a custom auth filter in front of our various UI pages that handles 
 user authentication.  However currently the RM assumes that if security is 
 enabled then the user must have configured spnego as well for the RM web 
 pages which is not true in our case.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package

2014-01-29 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885506#comment-13885506
 ] 

Jonathan Eagles commented on YARN-1632:
---

+1. Simple fix. Thanks, Chen.

 TestApplicationMasterServices should be under 
 org.apache.hadoop.yarn.server.resourcemanager package
 ---

 Key: YARN-1632
 URL: https://issues.apache.org/jira/browse/YARN-1632
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9, 2.2.0
Reporter: Chen He
Assignee: Chen He
Priority: Minor
 Attachments: yarn-1632v2.patch


 ApplicationMasterService is under 
 org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test 
 file TestApplicationMasterService is placed under 
 org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice 
 package which only contains one file (TestApplicationMasterService). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1670) aggregated log writer can write more log data then it says is the log length

2014-01-29 Thread Thomas Graves (JIRA)

Thomas Graves created YARN-1670:
---

 Summary: aggregated log writer can write more log data then it 
says is the log length
 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0, 0.23.10
Reporter: Thomas Graves


We have seen exceptions when using 'yarn logs' to read log files. 
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Long.parseLong(Long.java:441)
   at java.lang.Long.parseLong(Long.java:483)
   at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
   at 
org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
   at 
org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
   at 
org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)


We traced it down to the reader trying to read the file type of the next file 
but where it reads is still log data from the previous file.  What happened was 
the Log Length was written as a certain size but the log data was actually 
longer then that.  

Inside of the write() routine in LogValue it first writes what the logfile 
length is, but then when it goes to write the log itself it just goes to the 
end of the file.  There is a race condition here where if someone is still 
writing to the file when it goes to be aggregated the length written could be 
to small.

We should have the write() routine stop when it writes whatever it said was the 
length.  It would be nice if we could somehow tell the user it might be 
truncated but I'm not sure of a good way to do this.

We also noticed that a bug in readAContainerLogsForALogType where it is using 
an int for curRead whereas it should be using a long. 

  while (len != -1  curRead  fileLength) {

This isn't actually a problem right now as it looks like the underlying decoder 
is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

[
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885560#comment-13885560
]

Karthik Kambatla commented on YARN-1618:

Thanks for the review, Jian.

bq. I found New state on Recover event is possible to transition to
FINAL_SAVING state. This should not happen. FINAL_SAVING state should be
removed.
I suppose you are referring to the following transition. An RMAppEvent of type
RECOVER is created only when recovering applications, which means the
application is already in the store. For these applications, I am not sure if
we should save the state of this second attempt or not. I don't think either
approach would lead to store issues as reported here.
{code}
.addTransition(RMAppState.NEW, EnumSet.of(RMAppState.SUBMITTED,
RMAppState.ACCEPTED, RMAppState.FINISHED, RMAppState.FAILED,
RMAppState.KILLED, RMAppState.FINAL_SAVING),
RMAppEventType.RECOVER, new RMAppRecoveredTransition())
{code}

Applications transition from NEW to FINAL_SAVING, and try to update
non-existing entries in the state-store
---

Key: YARN-1618
URL: https://issues.apache.org/jira/browse/YARN-1618
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch

YARN-891 augments the RMStateStore to store information on completed
applications. In the process, it adds transitions from NEW to FINAL_SAVING.
This leads to the RM trying to update entries in the state-store that do not
exist. On ZKRMStateStore, this leads to the RM crashing.
Previous description:
ZKRMStateStore fails to handle updates to znodes that don't exist. For
instance, this can happen when an app transitions from NEW to FINAL_SAVING.
In these cases, the store should create the missing znode and handle the
update.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store


[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885574#comment-13885574
 ] 

Karthik Kambatla commented on YARN-1618:


[~bikassaha], [~jianhe] - if we need to spend more time on addressing Jian's 
comment on recovered applications, are you okay with addressing it in a 
follow-up JIRA? 


 Applications transition from NEW to FINAL_SAVING, and try to update 
 non-existing entries in the state-store
 ---

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-29 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885576#comment-13885576
 ] 

Bikas Saha commented on YARN-1618:
--

yeah. lets do it in a separate jira for clarity.

 Applications transition from NEW to FINAL_SAVING, and try to update 
 non-existing entries in the state-store
 ---

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1659) Define ApplicationTimelineStore interface and store-facing entity, entity-info and event objects

2014-01-29 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1659:
--

Attachment: YARN-1659.2.patch

Upload a new patch with some slight adjustment on class name

 Define ApplicationTimelineStore interface and store-facing entity, 
 entity-info and event objects
 

 Key: YARN-1659
 URL: https://issues.apache.org/jira/browse/YARN-1659
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1659-1.patch, YARN-1659.2.patch


 These will be used by ApplicationTimelineStore interface.  The web services 
 will convert the store-facing obects to the user-facing objects.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store


[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885578#comment-13885578
 ] 

Karthik Kambatla commented on YARN-1618:


Thanks Bikas. I ll create a separate JIRA for that, and go ahead and commit 
this then.

 Applications transition from NEW to FINAL_SAVING, and try to update 
 non-existing entries in the state-store
 ---

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store


[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885582#comment-13885582
 ] 

Karthik Kambatla commented on YARN-1618:


Filed YARN-1671. 

 Applications transition from NEW to FINAL_SAVING, and try to update 
 non-existing entries in the state-store
 ---

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1671) Revisit RMApp transitions from NEW on RECOVER


 [ 
https://issues.apache.org/jira/browse/YARN-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1671:
---

Issue Type: Sub-task  (was: Bug)
Parent: YARN-128

 Revisit RMApp transitions from NEW on RECOVER
 -

 Key: YARN-1671
 URL: https://issues.apache.org/jira/browse/YARN-1671
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla

 As discussed on YARN-1618, while recovering applications on restart, NEW - 
 FINAL_SAVING transition is possible. Revisit this to make sure we want this 
 transition. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1671) Revisit RMApp transitions from NEW on RECOVER

Karthik Kambatla created YARN-1671:
--

 Summary: Revisit RMApp transitions from NEW on RECOVER
 Key: YARN-1671
 URL: https://issues.apache.org/jira/browse/YARN-1671
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla


As discussed on YARN-1618, while recovering applications on restart, NEW - 
FINAL_SAVING transition is possible. Revisit this to make sure we want this 
transition. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1618) Fix invalid transition from NEW to FINAL_SAVING


 [ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1618:
---

Summary: Fix invalid transition from NEW to FINAL_SAVING  (was: 
Applications transition from NEW to FINAL_SAVING, and try to update 
non-existing entries in the state-store)

 Fix invalid transition from NEW to FINAL_SAVING
 ---

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1618) Fix invalid RMApp transition from NEW to FINAL_SAVING


 [ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1618:
---

Summary: Fix invalid RMApp transition from NEW to FINAL_SAVING  (was: Fix 
invalid transition from NEW to FINAL_SAVING)

 Fix invalid RMApp transition from NEW to FINAL_SAVING
 -

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1618) Fix invalid RMApp transition from NEW to FINAL_SAVING


 [ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1618:
---

Attachment: yarn-1618-branch-2.3.patch

Patch for branch-2.3. The patch is functionally the same, trivial conflicts due 
to YARN-321 merge changes to TestRMAppTransitions. 

 Fix invalid RMApp transition from NEW to FINAL_SAVING
 -

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch, 
 yarn-1618-branch-2.3.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1618) Fix invalid RMApp transition from NEW to FINAL_SAVING


[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885612#comment-13885612
 ] 

Hudson commented on YARN-1618:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5059 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5059/])
YARN-1618. Fix invalid RMApp transition from NEW to FINAL_SAVING (kasha) 
(kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562529)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 Fix invalid RMApp transition from NEW to FINAL_SAVING
 -

 Key: YARN-1618
 URL: https://issues.apache.org/jira/browse/YARN-1618
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch, 
 yarn-1618-branch-2.3.patch


 YARN-891 augments the RMStateStore to store information on completed 
 applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
 This leads to the RM trying to update entries in the state-store that do not 
 exist. On ZKRMStateStore, this leads to the RM crashing. 
 Previous description:
 ZKRMStateStore fails to handle updates to znodes that don't exist. For 
 instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
 In these cases, the store should create the missing znode and handle the 
 update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1618) Fix invalid RMApp transition from NEW to FINAL_SAVING

[
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885617#comment-13885617
]

Hadoop QA commented on YARN-1618:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12625911/yarn-1618-branch-2.3.patch
against trunk revision .

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2958//console

This message is automatically generated.

Fix invalid RMApp transition from NEW to FINAL_SAVING
-

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1672) YarnConfiguration is missing a default for yarn.nodemanager.log.retain-seconds

Karthik Kambatla created YARN-1672:
--

 Summary: YarnConfiguration is missing a default for 
yarn.nodemanager.log.retain-seconds
 Key: YARN-1672
 URL: https://issues.apache.org/jira/browse/YARN-1672
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial


YarnConfiguration is missing a default for yarn.nodemanager.log.retain-seconds



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1636) Implement timeline related web-services inside AHS for storing and retrieving entities+eventies

2014-01-29 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1636:
--

Attachment: YARN-1636.1.patch

Upload a patch which contains the REST API service part. The test cases of it 
are held off until in-memory implementation of ApplicationTimelineStore is 
ready.

 Implement timeline related web-services inside AHS for storing and retrieving 
 entities+eventies
 ---

 Key: YARN-1636
 URL: https://issues.apache.org/jira/browse/YARN-1636
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-1636.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1636) Implement timeline related web-services inside AHS for storing and retrieving entities+eventies


[ 
https://issues.apache.org/jira/browse/YARN-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885666#comment-13885666
 ] 

Hadoop QA commented on YARN-1636:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625924/YARN-1636.1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2959//console

This message is automatically generated.

 Implement timeline related web-services inside AHS for storing and retrieving 
 entities+eventies
 ---

 Key: YARN-1636
 URL: https://issues.apache.org/jira/browse/YARN-1636
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-1636.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1628) TestContainerManagerSecurity fails on trunk

2014-01-29 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885677#comment-13885677
 ] 

Daryn Sharp commented on YARN-1628:
---

+1.  Will check in later today.  Thanks!

 TestContainerManagerSecurity fails on trunk
 ---

 Key: YARN-1628
 URL: https://issues.apache.org/jira/browse/YARN-1628
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1628.patch


 The Test fails with the following error
 {noformat}
 java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost
   at 
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
   at 
 org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145)
   at 
 org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1639) YARM RM HA requires different configs on different RM hosts


 [ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1639:


Attachment: YARN-1639.4.patch

Add a test case to test if RM_HA_ID can not be found

 YARM RM HA requires different configs on different RM hosts
 ---

 Key: YARN-1639
 URL: https://issues.apache.org/jira/browse/YARN-1639
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1639.1.patch, YARN-1639.2.patch, YARN-1639.3.patch, 
 YARN-1639.4.patch


 We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
 want to first or second.
 This means we have different configs on different RM nodes. This is unlike 
 HDFS HA where the same configs are pushed to both NN's and it would be better 
 to have the same setup for RM as this would make installation and managing 
 easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1578) Fix how to handle ApplicationHistory about the container

2014-01-29 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885709#comment-13885709
 ] 

Zhijie Shen commented on YARN-1578:
---

[~sinchii], thanks for your investigation. In my previous comment, I meant it 
should be fine if the finish data of an container is not written by RM. Then, 
in this case, the finish data should not exist in the persisted history file. 
Therefore, in the following code,
{code}
if (entry.key.id.equals(containerId.toString())) {
  if (entry.key.suffix.equals(START_DATA_SUFFIX)) {
ContainerStartData startData =
parseContainerStartData(entry.value);
mergeContainerHistoryData(historyData, startData);
readStartData = true;
  } else if (entry.key.suffix.equals(FINISH_DATA_SUFFIX)) {
ContainerFinishData finishData =
parseContainerFinishData(entry.value);
mergeContainerHistoryData(historyData, finishData);
readFinishData = true;
  }
}
{code}
The second inner condition is supposed be failed. However, it seems that the 
second inner condition got passed, while the entry was actually not the byte[] 
to construct finish data instance.

 Fix how to handle ApplicationHistory about the container
 

 Key: YARN-1578
 URL: https://issues.apache.org/jira/browse/YARN-1578
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-321
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: YARN-1578.patch, application_1390978867235_0001, 
 resoucemanager.log, screenshot.png


 I carried out PiEstimator job at Hadoop cluster which applied YARN-321.
 After the job end and when I accessed Web UI of HistoryServer, it displayed 
 500. And HistoryServer daemon log was output as follows.
 {code}
 2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: 
 /applicationhistory/appattempt/appattempt_1389146249925_0008_01
 java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 (snip...)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201)
 at 
 org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110)
 (snip...)
 {code}
 I confirmed that there was container which was not finished from 
 ApplicationHistory file.
 In ResourceManager daemon log, ResourceManager reserved this container, but 
 did not allocate it.
 Therefore, about a container which is not allocated, it is necessary to 
 change how to handle in ApplicationHistory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1673) Valid yarn kill application prints out help message.

2014-01-29 Thread Tassapol Athiapinya (JIRA)

Tassapol Athiapinya created YARN-1673:
-

 Summary: Valid yarn kill application prints out help message.
 Key: YARN-1673
 URL: https://issues.apache.org/jira/browse/YARN-1673
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
Priority: Critical
 Fix For: 2.4.0


yarn application -kill application ID 
used to work previously. In 2.4.0 it prints out help message and does not kill 
the application.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1577) Unmanaged AM is broken because of YARN-1493


 [ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1577:
-

Target Version/s: 2.3.0  (was: )

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Jian He
Priority: Blocker

 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.


 [ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1506:
-

Target Version/s: 2.3.0  (was: )

Setting target version to 2.3.0 since it was originally targeted 2.4.0, and 2.3 
is the new 2.4.  If this isn't really a blocker for the 2.3.0 release, please 
either target it to a later version or downgrade the priority.

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, 
 YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1444) RM crashes when node resource request sent without corresponding rack request


 [ 
https://issues.apache.org/jira/browse/YARN-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1444:
-

Target Version/s: 2.3.0  (was: )

 RM crashes when node resource request sent without corresponding rack request
 -

 Key: YARN-1444
 URL: https://issues.apache.org/jira/browse/YARN-1444
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Reporter: Robert Grandl
Assignee: Wangda Tan
Priority: Blocker
 Attachments: yarn-1444.ver1.patch


 I have tried to force reducers to execute on certain nodes. What I did is I 
 changed for reduce tasks, the 
 RMContainerRequestor#addResourceRequest(req.priority, ResourceRequest.ANY, 
 req.capability) to RMContainerRequestor#addResourceRequest(req.priority, 
 HOST_NAME, req.capability). 
 However, this change lead to RM crashes when reducers needs to be assigned 
 with the following exception:
 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:841)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:640)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:554)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:695)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:739)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:549)
 at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1602) All failed RMStateStore operations should not be RMFatalEvents


 [ 
https://issues.apache.org/jira/browse/YARN-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1602:
-

Target Version/s: 2.3.0  (was: )

 All failed RMStateStore operations should not be RMFatalEvents
 --

 Key: YARN-1602
 URL: https://issues.apache.org/jira/browse/YARN-1602
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical

 Currently, if a state store operation fails, depending on the exception, 
 either a RMFatalEvent.STATE_STORE_FENCED or 
 RMFatalEvent.STATE_STORE_OP_FAILED events are created. The latter results in 
 the RM failing. Instead, we should probably kill the application 
 corresponding to the store operation. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts


[ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885752#comment-13885752
 ] 

Hadoop QA commented on YARN-1639:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625936/YARN-1639.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2960//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2960//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2960//console

This message is automatically generated.

 YARM RM HA requires different configs on different RM hosts
 ---

 Key: YARN-1639
 URL: https://issues.apache.org/jira/browse/YARN-1639
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1639.1.patch, YARN-1639.2.patch, YARN-1639.3.patch, 
 YARN-1639.4.patch


 We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
 want to first or second.
 This means we have different configs on different RM nodes. This is unlike 
 HDFS HA where the same configs are pushed to both NN's and it would be better 
 to have the same setup for RM as this would make installation and managing 
 easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1206) Container logs link is broken on RM web UI after application finished


 [ 
https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1206:
-

Target Version/s: 2.3.0  (was: )

 Container logs link is broken on RM web UI after application finished
 -

 Key: YARN-1206
 URL: https://issues.apache.org/jira/browse/YARN-1206
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Priority: Blocker

 With log aggregation disabled, when container is running, its logs link works 
 properly, but after the application is finished, the link shows 'Container 
 does not exist.'



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1611) Make admin refresh of configuration work across RM failover


 [ 
https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1611:


Attachment: YARN-1611.4.patch

 Make admin refresh of configuration work across RM failover
 ---

 Key: YARN-1611
 URL: https://issues.apache.org/jira/browse/YARN-1611
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, 
 YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch


 Currently, If we do refresh* for a standby RM, it will failover to the 
 current active RM, and do the refresh* based on the local configuration file 
 of the active RM. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1611) Make admin refresh of configuration work across RM failover


[ 
https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885756#comment-13885756
 ] 

Xuan Gong commented on YARN-1611:
-

create a patch only contains RemoteConfiguration functionality and for the 
scheduler configuration.

 Make admin refresh of configuration work across RM failover
 ---

 Key: YARN-1611
 URL: https://issues.apache.org/jira/browse/YARN-1611
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, 
 YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch


 Currently, If we do refresh* for a standby RM, it will failover to the 
 current active RM, and do the refresh* based on the local configuration file 
 of the active RM. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-867) Isolation of failures in aux services


 [ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-867:


Target Version/s: 2.3.0  (was: )

 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, 
 YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-867) Isolation of failures in aux services


[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885767#comment-13885767
 ] 

Hadoop QA commented on YARN-867:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606599/YARN-867.6.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2961//console

This message is automatically generated.

 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, 
 YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1639) YARM RM HA requires different configs on different RM hosts


 [ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1639:


Attachment: YARN-1639.5.patch

 YARM RM HA requires different configs on different RM hosts
 ---

 Key: YARN-1639
 URL: https://issues.apache.org/jira/browse/YARN-1639
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1639.1.patch, YARN-1639.2.patch, YARN-1639.3.patch, 
 YARN-1639.4.patch, YARN-1639.5.patch


 We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
 want to first or second.
 This means we have different configs on different RM nodes. This is unlike 
 HDFS HA where the same configs are pushed to both NN's and it would be better 
 to have the same setup for RM as this would make installation and managing 
 easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1611) Make admin refresh of configuration work across RM failover


[ 
https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885772#comment-13885772
 ] 

Hadoop QA commented on YARN-1611:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625953/YARN-1611.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2962//console

This message is automatically generated.

 Make admin refresh of configuration work across RM failover
 ---

 Key: YARN-1611
 URL: https://issues.apache.org/jira/browse/YARN-1611
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, 
 YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch


 Currently, If we do refresh* for a standby RM, it will failover to the 
 current active RM, and do the refresh* based on the local configuration file 
 of the active RM. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1611) Make admin refresh of configuration work across RM failover

2014-01-29 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885809#comment-13885809
 ] 

Sandy Ryza commented on YARN-1611:
--

Just took a quick look at the patch.  Right now we have a nice one way 
relationship where configs affect the services, but the services do not affect 
configs.  The patch appears to have the RM deleting and uploading files to the 
remote config directory, which makes me nervous.  Would it make sense for the 
admin to be responsible for placing configs in the remote dir, and the RMs just 
be responsible for pulling them down?

Also, a couple other questions:
* Will the existing way of doing things (writing files to disk for RMs and 
calling refresh on both) still be supported?
* Will the remote configuration be supported for the non-HA case?

We should make it possible to configure things the same way for HA and non-HA.

 Make admin refresh of configuration work across RM failover
 ---

 Key: YARN-1611
 URL: https://issues.apache.org/jira/browse/YARN-1611
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, 
 YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch


 Currently, If we do refresh* for a standby RM, it will failover to the 
 current active RM, and do the refresh* based on the local configuration file 
 of the active RM. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (YARN-1673) Valid yarn kill application prints out help message.


 [ 
https://issues.apache.org/jira/browse/YARN-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-1673:
-

Assignee: Vinod Kumar Vavilapalli

Looking at this quickly.

 Valid yarn kill application prints out help message.
 

 Key: YARN-1673
 URL: https://issues.apache.org/jira/browse/YARN-1673
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 2.4.0


 yarn application -kill application ID 
 used to work previously. In 2.4.0 it prints out help message and does not 
 kill the application.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts


[ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885832#comment-13885832
 ] 

Hadoop QA commented on YARN-1639:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625959/YARN-1639.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2963//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2963//console

This message is automatically generated.

 YARM RM HA requires different configs on different RM hosts
 ---

 Key: YARN-1639
 URL: https://issues.apache.org/jira/browse/YARN-1639
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1639.1.patch, YARN-1639.2.patch, YARN-1639.3.patch, 
 YARN-1639.4.patch, YARN-1639.5.patch


 We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
 want to first or second.
 This means we have different configs on different RM nodes. This is unlike 
 HDFS HA where the same configs are pushed to both NN's and it would be better 
 to have the same setup for RM as this would make installation and managing 
 easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1673) Valid yarn kill application prints out help message.


 [ 
https://issues.apache.org/jira/browse/YARN-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1673:
--

Priority: Blocker  (was: Critical)

I cornered it to YARN-967. Fixing this..

 Valid yarn kill application prints out help message.
 

 Key: YARN-1673
 URL: https://issues.apache.org/jira/browse/YARN-1673
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 2.4.0


 yarn application -kill application ID 
 used to work previously. In 2.4.0 it prints out help message and does not 
 kill the application.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1673) Valid yarn kill application prints out help message.


 [ 
https://issues.apache.org/jira/browse/YARN-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1673:
--

Attachment: YARN-1673.txt

This works on my setup. [~mayank_bansal], can you verify why this set was put 
in the first place? Tx.

 Valid yarn kill application prints out help message.
 

 Key: YARN-1673
 URL: https://issues.apache.org/jira/browse/YARN-1673
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1673.txt


 yarn application -kill application ID 
 used to work previously. In 2.4.0 it prints out help message and does not 
 kill the application.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1673) Valid yarn kill application prints out help message.

2014-01-29 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885874#comment-13885874
 ] 

Mayank Bansal commented on YARN-1673:
-

This will break CLI for history server, let me take a look at this

Thanks,
Mayank

 Valid yarn kill application prints out help message.
 

 Key: YARN-1673
 URL: https://issues.apache.org/jira/browse/YARN-1673
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1673.txt


 yarn application -kill application ID 
 used to work previously. In 2.4.0 it prints out help message and does not 
 kill the application.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1611) Make admin refresh of configuration work across RM failover

[
https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885871#comment-13885871
]

Xuan Gong commented on YARN-1611:
-

Thanks for comments, [~sandyr]

bq. Just took a quick look at the patch. Right now we have a nice one way
relationship where configs affect the services, but the services do not affect
configs. The patch appears to have the RM deleting and uploading files to the
remote config directory, which makes me nervous. Would it make sense for the
admin to be responsible for placing configs in the remote dir, and the RMs just
be responsible for pulling them down?

Yes, you are right. Will remove upload and deleting file from
RemoteConfiguration functionality.

bq. Will the existing way of doing things (writing files to disk for RMs and
calling refresh on both) still be supported?

Yes, it is still supported. In HA case, if there is no remote configuration, it
will give the warning message.

bq. Will the remote configuration be supported for the non-HA case?

Currently, no. This is for HA case.

Make admin refresh of configuration work across RM failover
---

Key: YARN-1611
URL: https://issues.apache.org/jira/browse/YARN-1611
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch,
YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch

Currently, If we do refresh* for a standby RM, it will failover to the
current active RM, and do the refresh* based on the local configuration file
of the active RM.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (YARN-1673) Valid yarn kill application prints out help message.

2014-01-29 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-1673:
---

Assignee: Mayank Bansal  (was: Vinod Kumar Vavilapalli)

 Valid yarn kill application prints out help message.
 

 Key: YARN-1673
 URL: https://issues.apache.org/jira/browse/YARN-1673
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
Assignee: Mayank Bansal
Priority: Blocker
 Attachments: YARN-1673.txt


 yarn application -kill application ID 
 used to work previously. In 2.4.0 it prints out help message and does not 
 kill the application.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1673) Valid yarn kill application prints out help message.


 [ 
https://issues.apache.org/jira/browse/YARN-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1673:
--

Fix Version/s: (was: 2.4.0)

 Valid yarn kill application prints out help message.
 

 Key: YARN-1673
 URL: https://issues.apache.org/jira/browse/YARN-1673
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
Assignee: Mayank Bansal
Priority: Blocker
 Attachments: YARN-1673.txt


 yarn application -kill application ID 
 used to work previously. In 2.4.0 it prints out help message and does not 
 kill the application.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1611) Make admin refresh of configuration work across RM failover


 [ 
https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1611:


Attachment: YARN-1611.5.patch

 Make admin refresh of configuration work across RM failover
 ---

 Key: YARN-1611
 URL: https://issues.apache.org/jira/browse/YARN-1611
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, 
 YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch, YARN-1611.5.patch


 Currently, If we do refresh* for a standby RM, it will failover to the 
 current active RM, and do the refresh* based on the local configuration file 
 of the active RM. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1674) Application launch gets stuck in ACCEPTED state

Trupti Dhavle created YARN-1674:
---

 Summary: Application launch gets stuck in ACCEPTED state
 Key: YARN-1674
 URL: https://issues.apache.org/jira/browse/YARN-1674
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Trupti Dhavle


During a test run, it was seen that for one of the application never started 
running. It was stuck in ACCEPTED state although the RM UI shows that the 
cluster had enough resources to run the application.
Even the subsequent apps got stuck.

From the logs-
From logs -
{noformat}
2014-01-29 11:53:36,030 INFO  capacity.ParentQueue 
(ParentQueue.java:assignContainers(583)) - assignedContainer queue=root 
usedCapacity=0.5 absoluteUsedCapacity=0.5 used=memory:4096, vCores:2 
cluster=memory:8192, vCores:8
2014-01-29 11:53:36,031 ERROR resourcemanager.ResourceManager 
(ResourceManager.java:handle(716)) - Error in handling event type 
CONTAINER_ALLOCATED for applicationAttempt application_1390987787623_0264
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:819)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:804)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:643)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:102)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:695)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:662)
2014-01-29 11:53:37,876 INFO  delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:createPassword(285)) - Creating 
password for identifier: owner=hrt_qa, renewer=oozi
{noformat}




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1675) Application does not change to RUNNING after being scheduled

Trupti Dhavle created YARN-1675:
---

 Summary: Application does not change to RUNNING after being 
scheduled
 Key: YARN-1675
 URL: https://issues.apache.org/jira/browse/YARN-1675
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Trupti Dhavle


I dont see any stacktraces in logs. But the debug logs show negative vcores-
2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(808)) - assignContainers: 
node=hor11n39.gq1.ygridcore.net #applications=5
2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
application_1390986573180_0269
2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 
currentConsumption=2048
2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
application=application_1390986573180_0269 request={Priority: 0, Capability: 
memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(911)) - post-assignContainers for application 
application_1390986573180_0269
2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 
currentConsumption=2048
2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
application=application_1390986573180_0269 request={Priority: 0, Capability: 
memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
application_1390986573180_0272
2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 
currentConsumption=2048
2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
application=application_1390986573180_0272 request={Priority: 0, Capability: 
memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(911)) - post-assignContainers for application 
application_1390986573180_0272
2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 
currentConsumption=2048
2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
application=application_1390986573180_0272 request={Priority: 0, Capability: 
memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
application_1390986573180_0273
2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0273 headRoom=memory:18432, vCores:-2 
currentConsumption=2048
2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
application=application_1390986573180_0273 request={Priority: 0, Capability: 
memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
2014-01-29 18:42:26,360 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(911)) - post-assignContainers for application 
application_1390986573180_0273
2014-01-29 18:42:26,360 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0273 headRoom=memory:18432, vCores:-2 
currentConsumption=2048
2014-01-29 18:42:26,360 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
application=application_1390986573180_0273 request={Priority: 0, Capability: 
memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
2014-01-29 18:42:26,360 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
application_1390986573180_0274
2014-01-29 18:42:26,360 DEBUG scheduler.SchedulerApplicationAttempt

[jira] [Updated] (YARN-1675) Application does not change to RUNNING after being scheduled


 [ 
https://issues.apache.org/jira/browse/YARN-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trupti Dhavle updated YARN-1675:


Component/s: resourcemanager

 Application does not change to RUNNING after being scheduled
 

 Key: YARN-1675
 URL: https://issues.apache.org/jira/browse/YARN-1675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Trupti Dhavle

 I dont see any stacktraces in logs. But the debug logs show negative vcores-
 {noformat}
 2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(808)) - assignContainers: 
 node=hor11n39.gq1.ygridcore.net #applications=5
 2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
 application_1390986573180_0269
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 
 currentConsumption=2048
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0269 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(911)) - post-assignContainers for 
 application application_1390986573180_0269
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 
 currentConsumption=2048
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0269 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
 application_1390986573180_0272
 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0272 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(911)) - post-assignContainers for 
 application application_1390986573180_0272
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0272 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
 application_1390986573180_0273
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0273 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0273 request={Priority: 0, Capability: 
 memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
 2014-01-29 18:42:26,360 DEBUG capacity.LeafQueue 
 (LeafQueue.java:assignContainers(911)) - post-assignContainers for 
 application application_1390986573180_0273
 2014-01-29 18:42:26,360 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
 application=application_1390986573180_0273 headRoom=memory:18432, vCores:-2 
 currentConsumption=2048
 2014-01-29 18:42:26,360 DEBUG scheduler.SchedulerApplicationAttempt 
 (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
 application=application_1390986573180_0273 request={Priority: 0,

[jira] [Updated] (YARN-1675) Application does not change to RUNNING after being scheduled


 [ 
https://issues.apache.org/jira/browse/YARN-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trupti Dhavle updated YARN-1675:


Description: 
I dont see any stacktraces in logs. But the debug logs show negative vcores-

{noformat}
2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(808)) - assignContainers: 
node=hor11n39.gq1.ygridcore.net #applications=5
2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
application_1390986573180_0269
2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 
currentConsumption=2048
2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
application=application_1390986573180_0269 request={Priority: 0, Capability: 
memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(911)) - post-assignContainers for application 
application_1390986573180_0269
2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 
currentConsumption=2048
2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
application=application_1390986573180_0269 request={Priority: 0, Capability: 
memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
application_1390986573180_0272
2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 
currentConsumption=2048
2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
application=application_1390986573180_0272 request={Priority: 0, Capability: 
memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(911)) - post-assignContainers for application 
application_1390986573180_0272
2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 
currentConsumption=2048
2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
application=application_1390986573180_0272 request={Priority: 0, Capability: 
memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
application_1390986573180_0273
2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0273 headRoom=memory:18432, vCores:-2 
currentConsumption=2048
2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
application=application_1390986573180_0273 request={Priority: 0, Capability: 
memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
2014-01-29 18:42:26,360 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(911)) - post-assignContainers for application 
application_1390986573180_0273
2014-01-29 18:42:26,360 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0273 headRoom=memory:18432, vCores:-2 
currentConsumption=2048
2014-01-29 18:42:26,360 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: 
application=application_1390986573180_0273 request={Priority: 0, Capability: 
memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true}
2014-01-29 18:42:26,360 DEBUG capacity.LeafQueue 
(LeafQueue.java:assignContainers(827)) - pre-assignContainers for application 
application_1390986573180_0274
2014-01-29 18:42:26,360 DEBUG scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: 
application=application_1390986573180_0274 headRoom=memory:16384, vCores:-3

[jira] [Updated] (YARN-1668) Make admin refreshAdminAcls work across RM failover


 [ 
https://issues.apache.org/jira/browse/YARN-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1668:


Attachment: YARN-1668.1.patch

create patch for admin refreshAdminAcls changes. This patch is based on 
YARN-1611

 Make admin refreshAdminAcls work across RM failover
 ---

 Key: YARN-1668
 URL: https://issues.apache.org/jira/browse/YARN-1668
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1668.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1504) RM changes for moving apps between queues

[
https://issues.apache.org/jira/browse/YARN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885942#comment-13885942
]

Karthik Kambatla commented on YARN-1504:

Comments:
# AbstractYarnScheduler: The exception message should specify the current
scheduler being used, not FairScheduler.
{code}
@Override
public String moveApplication(ApplicationId appId, String newQueue)
throws YarnException {
throw new YarnException(
Fair Scheduler does not support moving apps between queues);
}
{code}
# TestClientRMService: can we add more tests to cover all the error cases being
checked for in ClientRMService#move*()
# Shouldn't need a new RMAppAttemptEventType for MOVE?
# RMAppMoveTransition: Using futures, it is easy to regress and not set the
Exception or value of the future. Can we add comments (may be javadoc style)
describing the contract honored by RMAppMoveTransition? Also, we should add
unit tests for RMAppMoveTransition to avoid regressions in the future - verify
either the value or the exception is set.
# Nit: Not a fan of the field name - RMAppEvent#resultFuture. Any better names?
How about just result?
# Nit: Would drop the RMAppState changes (comments) - don't seem to add much.

RM changes for moving apps between queues
-

Key: YARN-1504
URL: https://issues.apache.org/jira/browse/YARN-1504
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Attachments: YARN-1504.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1667) Make admin refreshSuperUserGroupsConfiguration work across RM failover


 [ 
https://issues.apache.org/jira/browse/YARN-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1667:


Attachment: YARN-1667.1.patch

create the patch based on YARN-1611 for refreshSuperUserGroupsConfiguration 
changes

 Make admin refreshSuperUserGroupsConfiguration work across RM failover
 --

 Key: YARN-1667
 URL: https://issues.apache.org/jira/browse/YARN-1667
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1667.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1676) Make admin refreshUserToGroupsMappings of configuration work across RM failover

Xuan Gong created YARN-1676:
---

 Summary: Make admin refreshUserToGroupsMappings of configuration 
work across RM failover
 Key: YARN-1676
 URL: https://issues.apache.org/jira/browse/YARN-1676
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1498) Common scheduler changes for moving apps between queues


[ 
https://issues.apache.org/jira/browse/YARN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885973#comment-13885973
 ] 

Karthik Kambatla commented on YARN-1498:


Comments:
# AppSchedulingInfo: Not sure I understand the relevance of the following 
change to this JIRA. Am I missing something or is it just cleanup? 
{code}
-metrics.incrPendingResources(user, request.getNumContainers()
-- lastRequestContainers, Resources.subtractFrom( // save a clone
-Resources.multiply(request.getCapability(), request
-.getNumContainers()), Resources.multiply(lastRequestCapability,
-lastRequestContainers)));
+metrics.incrPendingResources(user, request.getNumContainers(),
+request.getCapability());
+metrics.decrPendingResources(user, lastRequestContainers,
+lastRequestCapability);
{code}
# Can we throw an Exception instead of returning null.
{code}
  @Override
  public ActiveUsersManager getActiveUsersManager() {
// Should never be called since all applications are submitted to LeafQueues
return null;
  }
{code}

Otherwise, looks good to me. 

 Common scheduler changes for moving apps between queues
 ---

 Key: YARN-1498
 URL: https://issues.apache.org/jira/browse/YARN-1498
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1498-1.patch, YARN-1498.patch, YARN-1498.patch


 This JIRA is to track changes that aren't in particular schedulers but that 
 help them support moving apps between queues.  In particular, it makes sure 
 that QueueMetrics are properly updated when an app changes queue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1676) Make admin refreshUserToGroupsMappings of configuration work across RM failover


 [ 
https://issues.apache.org/jira/browse/YARN-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1676:


Attachment: YARN-1676.1.patch

create the patch based on YARN-1611 for refreshUserToGroupsMappings changes

 Make admin refreshUserToGroupsMappings of configuration work across RM 
 failover
 ---

 Key: YARN-1676
 URL: https://issues.apache.org/jira/browse/YARN-1676
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1676.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1611) Make admin refresh of configuration work across RM failover


[ 
https://issues.apache.org/jira/browse/YARN-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886015#comment-13886015
 ] 

Hadoop QA commented on YARN-1611:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625987/YARN-1611.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2964//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2964//console

This message is automatically generated.

 Make admin refresh of configuration work across RM failover
 ---

 Key: YARN-1611
 URL: https://issues.apache.org/jira/browse/YARN-1611
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1611.1.patch, YARN-1611.2.patch, YARN-1611.2.patch, 
 YARN-1611.3.patch, YARN-1611.3.patch, YARN-1611.4.patch, YARN-1611.5.patch


 Currently, If we do refresh* for a standby RM, it will failover to the 
 current active RM, and do the refresh* based on the local configuration file 
 of the active RM. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1661) AppMaster logs says failing even if an application does succeed.


 [ 
https://issues.apache.org/jira/browse/YARN-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1661:
--

Fix Version/s: (was: 2.3.0)
 Assignee: Vinod Kumar Vavilapalli

 AppMaster logs says failing even if an application does succeed.
 

 Key: YARN-1661
 URL: https://issues.apache.org/jira/browse/YARN-1661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Vinod Kumar Vavilapalli

 Run:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distributed shell jar -shell_command ls
 Open AM logs. Last line would indicate AM failure even though container logs 
 print good ls result.
 {code}
 2014-01-24 21:45:29,592 INFO  [main] distributedshell.ApplicationMaster 
 (ApplicationMaster.java:finish(599)) - Application completed. Signalling 
 finish to RM
 2014-01-24 21:45:29,612 INFO  [main] impl.AMRMClientImpl 
 (AMRMClientImpl.java:unregisterApplicationMaster(315)) - Waiting for 
 application to be successfully unregistered.
 2014-01-24 21:45:29,816 INFO  [main] distributedshell.ApplicationMaster 
 (ApplicationMaster.java:main(267)) - Application Master failed. exiting
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1661) AppMaster logs says failing even if an application does succeed.


[ 
https://issues.apache.org/jira/browse/YARN-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886292#comment-13886292
 ] 

Vinod Kumar Vavilapalli commented on YARN-1661:
---

This was broken by YARN-1566.

 AppMaster logs says failing even if an application does succeed.
 

 Key: YARN-1661
 URL: https://issues.apache.org/jira/browse/YARN-1661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Vinod Kumar Vavilapalli

 Run:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distributed shell jar -shell_command ls
 Open AM logs. Last line would indicate AM failure even though container logs 
 print good ls result.
 {code}
 2014-01-24 21:45:29,592 INFO  [main] distributedshell.ApplicationMaster 
 (ApplicationMaster.java:finish(599)) - Application completed. Signalling 
 finish to RM
 2014-01-24 21:45:29,612 INFO  [main] impl.AMRMClientImpl 
 (AMRMClientImpl.java:unregisterApplicationMaster(315)) - Waiting for 
 application to be successfully unregistered.
 2014-01-24 21:45:29,816 INFO  [main] distributedshell.ApplicationMaster 
 (ApplicationMaster.java:main(267)) - Application Master failed. exiting
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1661) AppMaster logs says failing even if an application does succeed.


 [ 
https://issues.apache.org/jira/browse/YARN-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1661:
--

Attachment: YARN-1661.txt

The issue is that {{run()}} returns the value of success and the correct value 
of _success_ is set right after that by {{finish()}} method of 
ApplicationMaster.

Attaching patch that fixes this.

Tested that it works on a single node cluster.

 AppMaster logs says failing even if an application does succeed.
 

 Key: YARN-1661
 URL: https://issues.apache.org/jira/browse/YARN-1661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1661.txt


 Run:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distributed shell jar -shell_command ls
 Open AM logs. Last line would indicate AM failure even though container logs 
 print good ls result.
 {code}
 2014-01-24 21:45:29,592 INFO  [main] distributedshell.ApplicationMaster 
 (ApplicationMaster.java:finish(599)) - Application completed. Signalling 
 finish to RM
 2014-01-24 21:45:29,612 INFO  [main] impl.AMRMClientImpl 
 (AMRMClientImpl.java:unregisterApplicationMaster(315)) - Waiting for 
 application to be successfully unregistered.
 2014-01-24 21:45:29,816 INFO  [main] distributedshell.ApplicationMaster 
 (ApplicationMaster.java:main(267)) - Application Master failed. exiting
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1673) Valid yarn kill application prints out help message.


 [ 
https://issues.apache.org/jira/browse/YARN-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1673:
--

Target Version/s: 2.4.0

 Valid yarn kill application prints out help message.
 

 Key: YARN-1673
 URL: https://issues.apache.org/jira/browse/YARN-1673
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
Assignee: Mayank Bansal
Priority: Blocker
 Attachments: YARN-1673.txt


 yarn application -kill application ID 
 used to work previously. In 2.4.0 it prints out help message and does not 
 kill the application.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation


 [ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-978:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.4.0

 Attachments: YARN-978-1.patch, YARN-978.10.patch, YARN-978.2.patch, 
 YARN-978.3.patch, YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, 
 YARN-978.7.patch, YARN-978.8.patch, YARN-978.9.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-947) Defining the history data classes for the implementation of the reading/writing interface


 [ 
https://issues.apache.org/jira/browse/YARN-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-947:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 Defining the history data classes for the implementation of the 
 reading/writing interface
 -

 Key: YARN-947
 URL: https://issues.apache.org/jira/browse/YARN-947
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-947.1.patch, YARN-947.2.patch, YARN-947.3.patch, 
 YARN-947.4.patch, YARN-947.5.patch, YARN-947.6.patch, YARN-947.8.patch, 
 YARN-947.9.patch


 We need to define the history data classes have the exact fields to be 
 stored. Therefore, all the implementations don't need to have the duplicate 
 logic to exact the required information from RMApp, RMAppAttempt and 
 RMContainer.
 We use protobuf to define these classes, such that they can be ser/des 
 to/from bytes, which are easier for persistence.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation


 [ 
https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1123:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 [YARN-321] Adding ContainerReport and Protobuf implementation
 -

 Key: YARN-1123
 URL: https://issues.apache.org/jira/browse/YARN-1123
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Mayank Bansal
 Fix For: 2.4.0

 Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch, 
 YARN-1123-4.patch, YARN-1123-5.patch, YARN-1123-6.patch


 Like YARN-978, we need some client-oriented class to expose the container 
 history info. Neither Container nor RMContainer is the right one.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-987) Adding ApplicationHistoryManager responsible for exposing reports to all clients


 [ 
https://issues.apache.org/jira/browse/YARN-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-987:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 Adding ApplicationHistoryManager responsible for exposing reports to all 
 clients
 

 Key: YARN-987
 URL: https://issues.apache.org/jira/browse/YARN-987
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.4.0

 Attachments: YARN-987-1.patch, YARN-987-2.patch, YARN-987-3.patch, 
 YARN-987-4.patch, YARN-987-5.patch, YARN-987-6.patch, YARN-987-7.patch, 
 YARN-987-8.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol


 [ 
https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-955:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 [YARN-321] Implementation of ApplicationHistoryProtocol
 ---

 Key: YARN-955
 URL: https://issues.apache.org/jira/browse/YARN-955
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Fix For: 2.4.0

 Attachments: YARN-955-1.patch, YARN-955-2.patch, YARN-955-3.patch, 
 YARN-955-4.patch, YARN-955-5.patch, YARN-955-6.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1242) Script changes to start AHS as an individual process


 [ 
https://issues.apache.org/jira/browse/YARN-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1242:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 Script changes to start AHS as an individual process
 

 Key: YARN-1242
 URL: https://issues.apache.org/jira/browse/YARN-1242
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Mayank Bansal
 Fix For: 2.4.0

 Attachments: YARN-1242-1.patch, YARN-1242-2.patch, YARN-1242-3.patch


 Add the command in yarn and yarn.cmd to start and stop AHS



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-930) Bootstrap ApplicationHistoryService module


 [ 
https://issues.apache.org/jira/browse/YARN-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-930:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 Bootstrap ApplicationHistoryService module
 --

 Key: YARN-930
 URL: https://issues.apache.org/jira/browse/YARN-930
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-321
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-930-20130716.1.txt, YARN-930-20130716.2.txt, 
 YARN-930-20130716.txt






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1534) TestAHSWebApp failed in YARN-321 branch


 [ 
https://issues.apache.org/jira/browse/YARN-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1534:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 TestAHSWebApp failed in YARN-321 branch
 ---

 Key: YARN-1534
 URL: https://issues.apache.org/jira/browse/YARN-1534
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-321
 Environment: CentOS 6.3, JDK 1.6.0_31
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Fix For: 2.4.0

 Attachments: YARN-1534.patch


 I ran the following commands. And I confirmed failure of TestAHSWebApp.
 {code}
 [sinchii@hdX YARN-321-test]$ mvn clean test 
 -Dtest=org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.*
 {code}
 {code}
 Running 
 org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices
 Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.492 sec - 
 in 
 org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices
 Running 
 org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebApp
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.193 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebApp
 initializationError(org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebApp)
   Time elapsed: 0.016 sec   ERROR!
 java.lang.Exception: Test class should have exactly one public zero-argument 
 constructor
 at 
 org.junit.runners.BlockJUnit4ClassRunner.validateZeroArgConstructor(BlockJUnit4ClassRunner.java:144)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.validateConstructor(BlockJUnit4ClassRunner.java:121)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.collectInitializationErrors(BlockJUnit4ClassRunner.java:101)
 at org.junit.runners.ParentRunner.validate(ParentRunner.java:344)
 (*snip*)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1191) [YARN-321] Update artifact versions for application history service


 [ 
https://issues.apache.org/jira/browse/YARN-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1191:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 [YARN-321] Update artifact versions for application history service
 ---

 Key: YARN-1191
 URL: https://issues.apache.org/jira/browse/YARN-1191
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.4.0

 Attachments: YARN-1191-1.patch


 Compilation is failing for YARN-321 branch



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1605) Fix formatting issues with new module in YARN-321 branch


 [ 
https://issues.apache.org/jira/browse/YARN-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1605:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 Fix formatting issues with new module in YARN-321 branch
 

 Key: YARN-1605
 URL: https://issues.apache.org/jira/browse/YARN-1605
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1605-20140116.txt


 There are a bunch of formatting issues. I'm restricting myself for a sweep of 
 all the files in the new module.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-967) [YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data


 [ 
https://issues.apache.org/jira/browse/YARN-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-967:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 [YARN-321] Command Line Interface(CLI) for Reading Application History 
 Storage Data
 ---

 Key: YARN-967
 URL: https://issues.apache.org/jira/browse/YARN-967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Mayank Bansal
 Fix For: 2.4.0

 Attachments: YARN-967-1.patch, YARN-967-10.patch, YARN-967-11.patch, 
 YARN-967-12.patch, YARN-967-13.patch, YARN-967-14.patch, YARN-967-2.patch, 
 YARN-967-3.patch, YARN-967-4.patch, YARN-967-5.patch, YARN-967-6.patch, 
 YARN-967-7.patch, YARN-967-8.patch, YARN-967-9.patch, YARN-967.15.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-954) [YARN-321] History Service should create the webUI and wire it to HistoryStorage


 [ 
https://issues.apache.org/jira/browse/YARN-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-954:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 [YARN-321] History Service should create the webUI and wire it to 
 HistoryStorage
 

 Key: YARN-954
 URL: https://issues.apache.org/jira/browse/YARN-954
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-954-3.patch, YARN-954-v0.patch, YARN-954-v1.patch, 
 YARN-954-v2.patch, YARN-954.4.patch, YARN-954.5.patch, YARN-954.6.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-975) Add a file-system implementation for history-storage


 [ 
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-975:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 Add a file-system implementation for history-storage
 

 Key: YARN-975
 URL: https://issues.apache.org/jira/browse/YARN-975
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-975.1.patch, YARN-975.10.patch, YARN-975.11.patch, 
 YARN-975.2.patch, YARN-975.3.patch, YARN-975.4.patch, YARN-975.5.patch, 
 YARN-975.6.patch, YARN-975.7.patch, YARN-975.8.patch, YARN-975.9.patch


 HDFS implementation should be a standard persistence strategy of history 
 storage



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1555) [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.*


 [ 
https://issues.apache.org/jira/browse/YARN-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1555:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 [YARN-321] Failing tests in 
 org.apache.hadoop.yarn.server.applicationhistoryservice.*
 -

 Key: YARN-1555
 URL: https://issues.apache.org/jira/browse/YARN-1555
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1555-20140102.txt


 Several tests are failing on the latest YARN-321 branch.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-935) YARN-321 branch is broken due to applicationhistoryserver module's pom.xml


 [ 
https://issues.apache.org/jira/browse/YARN-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-935:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 YARN-321 branch is broken due to applicationhistoryserver module's pom.xml
 --

 Key: YARN-935
 URL: https://issues.apache.org/jira/browse/YARN-935
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-935.1.patch, YARN-935.2.patch


 The branch was created from branch-2, 
 hadoop-yarn-server-applicationhistoryserver/pom.xml should use 
 2.2.0-SNAPSHOT, not 3.0.0-SNAPSHOT. Otherwise, the sub-project cannot be 
 built correctly because of wrong dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-974) RMContainer should collect more useful information to be recorded in Application-History


 [ 
https://issues.apache.org/jira/browse/YARN-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-974:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 RMContainer should collect more useful information to be recorded in 
 Application-History
 

 Key: YARN-974
 URL: https://issues.apache.org/jira/browse/YARN-974
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-974.1.patch, YARN-974.2.patch, YARN-974.3.patch, 
 YARN-974.4.patch, YARN-974.5.patch


 To record the history of a container, users may be also interested in the 
 following information:
 1. Start Time
 2. Stop Time
 3. Diagnostic Information
 4. URL to the Log File
 5. Actually Allocated Resource
 6. Actually Assigned Node
 These should be remembered during the RMContainer's life cycle.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-962) Update application_history_service.proto


 [ 
https://issues.apache.org/jira/browse/YARN-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-962:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 Update application_history_service.proto
 

 Key: YARN-962
 URL: https://issues.apache.org/jira/browse/YARN-962
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-962.1.patch


 1. Change it's name to application_history_client.proto
 2. Fix the incorrect proto reference.
 3. Correct the dir in pom.xml



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1595) Test failures on YARN-321 branch


 [ 
https://issues.apache.org/jira/browse/YARN-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1595:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 Test failures on YARN-321 branch
 

 Key: YARN-1595
 URL: https://issues.apache.org/jira/browse/YARN-1595
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1595-20140115.1.txt, YARN-1595-20140115.txt, 
 YARN-1595-20140116.1.txt, YARN-1595-20140116.txt


 mvn test doesn't pass on YARN-321 branch anymore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1266) Implement PB service and client wrappers for ApplicationHistoryProtocol


 [ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1266:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 Implement PB service and client wrappers for ApplicationHistoryProtocol
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.4.0

 Attachments: YARN-1266-1.patch, YARN-1266-2.patch, YARN-1266-3.patch, 
 YARN-1266-4.patch, YARN-1266-5.patch, YARN-1266-6.patch


 Adding ApplicationHistoryProtocolPBService to make web apps to work and 
 changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-979) [YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol


 [ 
https://issues.apache.org/jira/browse/YARN-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-979:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 [YARN-321] Add more APIs related to ApplicationAttempt and Container in 
 ApplicationHistoryProtocol
 --

 Key: YARN-979
 URL: https://issues.apache.org/jira/browse/YARN-979
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.4.0

 Attachments: YARN-979-1.patch, YARN-979-3.patch, YARN-979-4.patch, 
 YARN-979-5.patch, YARN-979-6.patch, YARN-979.2.patch, YARN-979.7.patch


 ApplicationHistoryProtocol should have the following APIs as well:
 * getApplicationAttemptReport
 * getApplicationAttempts
 * getContainerReport
 * getContainers
 The corresponding request and response classes need to be added as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1023) [YARN-321] Webservices REST API's support for Application History


 [ 
https://issues.apache.org/jira/browse/YARN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1023:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 [YARN-321] Webservices REST API's support for Application History
 -

 Key: YARN-1023
 URL: https://issues.apache.org/jira/browse/YARN-1023
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-321
Reporter: Devaraj K
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-1023-v0.patch, YARN-1023-v1.patch, 
 YARN-1023.2.patch, YARN-1023.3.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-934) HistoryStorage writer interface for Application History Server


 [ 
https://issues.apache.org/jira/browse/YARN-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-934:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 HistoryStorage writer interface for Application History Server
 --

 Key: YARN-934
 URL: https://issues.apache.org/jira/browse/YARN-934
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-934.1.patch, YARN-934.2.patch, YARN-934.3.patch, 
 YARN-934.4.patch, YARN-934.5.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-984) [YARN-321] Move classes from applicationhistoryservice.records.pb.impl package to applicationhistoryservice.records.impl.pb


 [ 
https://issues.apache.org/jira/browse/YARN-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-984:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 [YARN-321] Move classes from applicationhistoryservice.records.pb.impl 
 package to applicationhistoryservice.records.impl.pb
 ---

 Key: YARN-984
 URL: https://issues.apache.org/jira/browse/YARN-984
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Devaraj K
Assignee: Devaraj K
 Fix For: 2.4.0

 Attachments: YARN-984-1.patch, YARN-984.patch


 While creating instance for applicationhistoryservice.records.* pb records, 
 It is throwing the ClassNotFoundException. 
 {code:xml}
 Caused by: java.lang.ClassNotFoundException: Class 
 org.apache.hadoop.yarn.server.applicationhistoryservice.records.impl.pb.ApplicationHistoryDataPBImpl
  not found
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1619)
   at 
 org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:56)
   ... 49 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1578) Fix how to handle ApplicationHistory about the container

2014-01-29 Thread Shinichi Yamashita (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichi Yamashita updated YARN-1578:
-

Attachment: screenshot2.pdf

I recognized that code which you showed was 
FileSystemApplicationHistoryStore.getContainer(ContainerId) method.
That code is OK, and we can watch information of ApplicationMaster in web UI.

And when I access the information of the list of containers from a link of 
AppAttempt, web UI displays 500 (attach screenshot2.pdf).
I'm sorry about unkindness of my explanation.

By this access, AHS calls 
FileSystemApplicationHistoryStore.getContainers(ApplicationAttemptId) and 
ContainerFinishData is not set with the following code.
{code}
HistoryFileReader hfReader =
getHistoryFileReader(appAttemptId.getApplicationId());
try {
  while (hfReader.hasNext()) {
HistoryFileReader.Entry entry = hfReader.next();
if (entry.key.id.startsWith(ConverterUtils.CONTAINER_PREFIX)) {
  if (entry.key.suffix.equals(START_DATA_SUFFIX)) {
retrieveStartFinishData(appAttemptId, entry, startFinshDataMap,
  true);
  } else if (entry.key.suffix.equals(FINISH_DATA_SUFFIX)) {
retrieveStartFinishData(appAttemptId, entry, startFinshDataMap,
  false);
  }
}
  }
  LOG.info(Completed reading history information of all conatiners
  +  of application attempt  + appAttemptId);
} catch (IOException e) {
  LOG.info(Error when reading history information of some containers
  +  of application attempt  + appAttemptId);
} finally {
  hfReader.close();
}
{code}

In consideration of the possibility that finish data was not included in 
history file, I thought that we should fix how to read history file in 
FileSystemApplicationHistoryStore.

 Fix how to handle ApplicationHistory about the container
 

 Key: YARN-1578
 URL: https://issues.apache.org/jira/browse/YARN-1578
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-321
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: YARN-1578.patch, application_1390978867235_0001, 
 resoucemanager.log, screenshot.png, screenshot2.pdf


 I carried out PiEstimator job at Hadoop cluster which applied YARN-321.
 After the job end and when I accessed Web UI of HistoryServer, it displayed 
 500. And HistoryServer daemon log was output as follows.
 {code}
 2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: 
 /applicationhistory/appattempt/appattempt_1389146249925_0008_01
 java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 (snip...)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201)
 at 
 org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110)
 (snip...)
 {code}
 I confirmed that there was container which was not finished from 
 ApplicationHistory file.
 In ResourceManager daemon log, ResourceManager reserved this container, but 
 did not allocate it.
 Therefore, about a container which is not allocated, it is necessary to 
 change how to handle in ApplicationHistory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1596) Javadoc failures on YARN-321 branch


 [ 
https://issues.apache.org/jira/browse/YARN-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1596:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 Javadoc failures on YARN-321 branch
 ---

 Key: YARN-1596
 URL: https://issues.apache.org/jira/browse/YARN-1596
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1596.txt


 There are some javadoc issues on YARN-321 branch.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170


 [ 
https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1379:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
 --

 Key: YARN-1379
 URL: https://issues.apache.org/jira/browse/YARN-1379
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1379.txt


 Found this while merging YARN-321 to the latest branch-2. Without this, 
 compilation fails.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-956) [YARN-321] Add a testable in-memory HistoryStorage


 [ 
https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-956:
-

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 [YARN-321] Add a testable in-memory HistoryStorage 
 ---

 Key: YARN-956
 URL: https://issues.apache.org/jira/browse/YARN-956
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-956-1.patch, YARN-956-2.patch, YARN-956-3.patch, 
 YARN-956.4.patch, YARN-956.5.patch, YARN-956.6.patch, YARN-956.7.patch, 
 YARN-956.8.patch, YARN-956.9.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1594) YARN-321 branch needs to be updated after YARN-888 pom changes


 [ 
https://issues.apache.org/jira/browse/YARN-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1594:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 YARN-321 branch needs to be updated after YARN-888 pom changes
 --

 Key: YARN-1594
 URL: https://issues.apache.org/jira/browse/YARN-1594
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1594-20140113.txt, YARN-1594.txt


 YARN-888 changed the pom structure. And so latest merge to trunk breaks 
 YARN-321 branch.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1597) FindBugs warnings on YARN-321 branch


 [ 
https://issues.apache.org/jira/browse/YARN-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1597:
--

Fix Version/s: (was: YARN-321)
   2.4.0

YARN-321 branch is merged into trunk and branch-2. Setting fix-version of all 
committed patches under YARN-321 to be 2.4.0.

 FindBugs warnings on YARN-321 branch
 

 Key: YARN-1597
 URL: https://issues.apache.org/jira/browse/YARN-1597
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1597.txt


 There are a bunch of findBugs warnings on YARN-321 branch.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1007) [YARN-321] Enhance History Reader interface for Containers