[jira] [Commented] (YARN-321) Generic application history service

2015-03-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353825#comment-14353825
 ] 

Allen Wittenauer commented on YARN-321:
---

Looks like this should get closed out w/a fix ver of 2.4.0?

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-321) Generic application history service

2014-07-28 Thread Patrick Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076080#comment-14076080
 ] 

Patrick Morton commented on YARN-321:
-

Compared to wrists well is less available status amphetamines but higher 
investigations of withdrawal. 
adderall 20 mg 
http://www.surveyanalytics.com//userimages/sub-2/2007589/3153260/29851520/7787428-29851520-stopadd3.html
 
Areas also document any reasons they have surprisingly been using in the 
information.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-321) Generic application history service

2014-01-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887647#comment-13887647
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #467 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/467/])
Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562950)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-321) Generic application history service

2014-01-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887731#comment-13887731
 ] 

Hudson commented on YARN-321:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1684 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1684/])
Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562950)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-321) Generic application history service

2014-01-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887745#comment-13887745
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1659 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1659/])
Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562950)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-321) Generic application history service

2014-01-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887036#comment-13887036
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5074 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5074/])
Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562950)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-321) Generic application history service

2014-01-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882274#comment-13882274
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #462 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/462/])
YARN-1625. Fixed RAT warnings after YARN-321 merge. Contributed by Shinichi 
Yamashita. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561458)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
YARN-321. Merging YARN-321 branch to trunk.
svn merge ../branches/YARN-321 (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561452)
* /hadoop/common/trunk
* 
/hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* /hadoop/common/trunk/hadoop-mapreduce-project
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ContainerNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/application_history_client.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 

[jira] [Commented] (YARN-321) Generic application history service

2014-01-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882297#comment-13882297
 ] 

Hudson commented on YARN-321:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1679 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1679/])
YARN-1625. Fixed RAT warnings after YARN-321 merge. Contributed by Shinichi 
Yamashita. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561458)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
YARN-321. Merging YARN-321 branch to trunk.
svn merge ../branches/YARN-321 (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561452)
* /hadoop/common/trunk
* 
/hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* /hadoop/common/trunk/hadoop-mapreduce-project
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ContainerNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/application_history_client.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 

[jira] [Commented] (YARN-321) Generic application history service

2014-01-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882302#comment-13882302
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1654 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1654/])
YARN-1625. Fixed RAT warnings after YARN-321 merge. Contributed by Shinichi 
Yamashita. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561458)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
YARN-321. Merging YARN-321 branch to trunk.
svn merge ../branches/YARN-321 (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561452)
* /hadoop/common/trunk
* 
/hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* /hadoop/common/trunk/hadoop-mapreduce-project
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ContainerNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/application_history_client.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 

[jira] [Commented] (YARN-321) Generic application history service

2014-01-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882176#comment-13882176
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5039 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5039/])
YARN-321. Merging YARN-321 branch to trunk.
svn merge ../branches/YARN-321 (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561452)
* /hadoop/common/trunk
* 
/hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* /hadoop/common/trunk/hadoop-mapreduce-project
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ContainerNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/application_history_client.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AHSClient.java
* 

[jira] [Commented] (YARN-321) Generic application history service

2014-01-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882206#comment-13882206
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5040 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5040/])
YARN-1625. Fixed RAT warnings after YARN-321 merge. Contributed by Shinichi 
Yamashita. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561458)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-321) Generic application history service

2014-01-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868043#comment-13868043
 ] 

Zhijie Shen commented on YARN-321:
--

bq. 1. Does it provide a function to set maximum files and maximum retention 
period of AppicationHistory to store in HDFS?

No, currently the FS implementation doesn't discard the historic data of the 
applications completed before sometime, answer users' requests based on all the 
stored applications. However, via REST API, users are able to filter the 
applications outside a start/finish time window.

bq. 2. When there are many AppilicationHistory in HDFS, does it not limit the 
number of the reading of ApplicationHistory?

As to REST API, the users are able to limit the number of applications that AHS 
should return. As to HDFS access, the current implementation is going to load 
all the stored applications and filtering them one-by-one, which is not a 
efficient way given a big application collection. YARN-925 is reopened to 
discuss pushing the filtering into the implementation of the history store, 
where we can prevent loading all the applications.  Meanwhile, caching 
(YARN-1322) is another way to reduce I/O.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-321) Generic application history service

2014-01-07 Thread Shinichi Yamashita (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865140#comment-13865140
 ] 

Shinichi Yamashita commented on YARN-321:
-

I confirmed attached design document. And I have two questions about 
FileSystemApplicationHistoryStore.

1. Does it provide a function to set maximum files and maximum retention period 
of AppicationHistory to store in HDFS?
2. When there are many AppilicationHistory in HDFS, does it not limit the 
number of the reading of ApplicationHistory?

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-321) Generic application history service

2013-12-23 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855707#comment-13855707
 ] 

Robert Joseph Evans commented on YARN-321:
--

The way it currently works is based off of group permissions on a directory 
(this is from memory from a while ago so I could be off on a few things).  In 
HDFS when you create a file the group of the file is the group of the directory 
the file is a part of, similar to the sticky bit on a directory in Linux.  When 
an MR job completes it will copy it's history log file, along with a few other 
files, to a drop box like location called intermediate done and atomically 
rename it from a temp name to the final name.  The directory is world writable, 
but only readable by a special group that the history server is a part of, but 
general users are not.  The history server then wakes up periodically and will 
scan that directory for new files, when it sees new files it will move them to 
a final location that is owned by the headless history server user.  If a query 
comes in for a job that the history server is not aware of, it will also scan 
the intermediate done directory before failing.

Reading history data is done through RPC to the history server, or through the 
web interface, including RESTful APIs.  There is no supported way for an app to 
read history data directly though the file system.  I hope this helps.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-321) Generic application history service

2013-12-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856021#comment-13856021
 ] 

Vinod Kumar Vavilapalli commented on YARN-321:
--

Tx Zhijie for answering most of Sandy's questions, you are spot on. I'll update 
the design doc to clarify things where it isn't clear.


bq. What is the jira for app specific history data?
I just filed YARN-1530, will post more information soon.


bq. Could you describe the security requirements a bit further. Its not clear 
to everyone how everything works currently. To be clear, what exactly needs to 
be done to make apps write and read history data.
The data covered in this JIRA is generic and only RM gets to write it. The 
consumers of this data are *both* the cluster admins for historical analyses as 
well as individual apps that chose to not use features that come out of 
YARN-1530. As such, we cannot let apps write history data.


bq. How is the shared bus different from writing to a file. I would think one 
would cover the other.
Yes, writing to a file is one example of a shared-bus. I'll fix it if the doc 
is confusing w.r.t this.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-321) Generic application history service

2013-12-19 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853439#comment-13853439
 ] 

Sandy Ryza commented on YARN-321:
-

Thanks Vinod for the design doc.  A few additional questions:

Will the AHS serve logs?
Is my understanding correct that for the first version, the RM will not do any 
RPCs to the AHS.
Will APIs be Public / Stable?
In the first version, what will be provided in terms of web UI?
How are ACLs enforced between the AHS and RM?
Eventually we want the AHS to interact well with long-running services, right?  
For the file-based bus, have we though about an approach for how it might store 
info for applications in non-terminal states?


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-321) Generic application history service

2013-12-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853492#comment-13853492
 ] 

Zhijie Shen commented on YARN-321:
--

AFAIK,

bq. Will the AHS serve logs?

It will serve the aggregated logs.

bq. Is my understanding correct that for the first version, the RM will not do 
any RPCs to the AHS.

Yes, RM and AHS don't communicate via RPC, but the shared bus.

bq. Will APIs be Public / Stable?

Assume you're talking about RPC interface. It will be Public but Unstable.

bq. In the first version, what will be provided in terms of web UI?

* The page of a list of completed applications (FINISHED, FAILED, KILLED) in 
HistoryStorage
* The page of one application and a list of its attempts
* The page of one attempt and a list of its containers
* The page of one container
* The page of the container logs

bq. How are ACLs enforced between the AHS and RM?

Is your question how the ACLs enforced on one application when it is running by 
RM is inherited by AHS when the application is finished and stored in 
HistoryStorage? My rough idea is to persist the ACLs info into HistoryStorage 
as well. For ApplicationACLs, it seems to be much obvious. For QueueACLs, I'm 
not quite sure, may be complex and depend on scheduler.

bq. Eventually we want the AHS to interact well with long-running services, 
right? 

I think so.

bq. For the file-based bus, have we though about an approach for how it might 
store info for applications in non-terminal states?

Currently, HistoryStorage is receiving a series of events which notifies the 
updates about an application/attempt/container. The info is accumulated 
incrementally. IMO, the strategy can be applied to a long-running app with 
defining more intermediate events (now we only defines start and finish events) 
to report the progress. Then, instead of holding the events until app gets 
finished, we need to flush them in advance/periodically/..., such that users 
can get informed of latest progress of the app.

Hopefully the answers can address you questions.:-)

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-321) Generic application history service

2013-12-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838096#comment-13838096
 ] 

Zhijie Shen commented on YARN-321:
--

bq. Will this jira been included in the next release ?

Yes, AHS is included in the plan of 2.4

bq. Another question about this jira. I found that the container logURL is 
hard-coded there, user still could not see the logs of each container ( stdout, 
stderror ). Is it on the roadmap that allow user to see the logs ? And which 
jira is tracking this ? 

Are you talking about showing aggregated logs via web interfaces? If it is, you 
may want to keep an eye on YARN-1413, which is tracking this issue.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-321) Generic application history service

2013-12-02 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837276#comment-13837276
 ] 

Jeff Zhang commented on YARN-321:
-

Will this jira been included in the next release ?

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-321) Generic application history service

2013-12-02 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837281#comment-13837281
 ] 

Jeff Zhang commented on YARN-321:
-

Another question about this jira. I found that the container logURL is 
hard-coded there, user still could not see the logs of each container ( stdout, 
stderror ). Is it on the roadmap that allow user to see the logs ? And which 
jira is tracking this ? Thanks .

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-321) Generic application history service

2013-10-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790555#comment-13790555
 ] 

Robert Joseph Evans commented on YARN-321:
--

I like the diagrams, but I want to understand if the generic application 
history service is intended to replace the job history server, or to just 
augment it?

I would prefer it if we could replace the current server. Perhaps not in the 
first release, but eventually.  To make that work we would have to provide a 
way for MR specific code to come up and run inside the service, exposing both 
the current restful web service, an application specific UI, and the RPC server 
that we currently run.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-321) Generic application history service

2013-10-09 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790604#comment-13790604
 ] 

Zhijie Shen commented on YARN-321:
--

bq. I like the diagrams, but I want to understand if the generic application 
history service is intended to replace the job history server, or to just 
augment it?

Yes, there's some previous discussion on recording the per application type 
history data, but we plan to exclude it from the initial version of AHS. 
Eventually, we'd like to integrate JHS into AHS in some way, the details of 
which we could discuss in the follow jiras. To me, it would be better if we can 
design a common per application type plugin framework, such that we can easily 
integrate the HS of other applications on YARN.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-321) Generic application history service

2013-10-08 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789876#comment-13789876
 ] 

Sandy Ryza commented on YARN-321:
-

Was a design doc ever written up for this?  The HistoryStorageDemo.java is a 
good start for understanding some of the interfaces, but it would be helpful to 
have something that explains things like what the Application History Service's 
role is, how it interacts with the RM, and key differences and similarities 
with the Job History Server.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-321) Generic application history service

2013-10-08 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790047#comment-13790047
 ] 

Sandy Ryza commented on YARN-321:
-

Thanks Vinod and Zhijie.  Didn't see the comment.  I'm going to attach your 
outline as a pdf to make it a little easier for passers-by to learn about.  
Here's the google doc it came from if you want to edit: 
https://docs.google.com/document/d/1cNsdGyLuagR8lzfeQrAclOAd-AdkVwgST6OG8Zzp43M/edit#heading=h.15p1lkmmm9g8
 

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-321) Generic application history service

2013-07-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712546#comment-13712546
 ] 

Karthik Kambatla commented on YARN-321:
---

bq. Folks, it would be great if we have a consolidated document that describes 
the design and some details.
+1

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-18 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712796#comment-13712796
 ] 

Zhijie Shen commented on YARN-321:
--

bq. Running as service: By default, ApplicationHistoryService will be embedded 
inside ResourceManager but will be independent enough to run as a separate 
service for scaling purposes.

IIUC, to be independent, ApplicationHistoryService should have its own event 
dispatcher, shouldn't it?

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-17 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711345#comment-13711345
 ] 

Bikas Saha commented on YARN-321:
-

Folks, it would be great if we have a consolidated document that describes the 
design and some details. We can keep posting revisions of the document to this 
jira as things changes significantly. HDFS-2802 is an example. Its a little 
difficult to piece it together from multiple comments and clarifications.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-16 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709940#comment-13709940
 ] 

Karthik Kambatla commented on YARN-321:
---

The approach in HistoryStorageDemo looks good, like the fact that the schema 
goes into the tuple.

(Thinking out loud) Are we decided on who all can write to HistoryStorage?
# Single writer - RM? If so, the AMs should pass all the information to the RM. 
Need to carefully handle the HA scenarios, may be as part of the HA work.
# Both AMs and RM write to HistoryStorage directly? Writes synchronized at the 
tuple level? I am thinking of long-running services here - they might want 
AM/RM to write tuples every so often.


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-16 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709966#comment-13709966
 ] 

Mayank Bansal commented on YARN-321:


I think History storage should be written by AHS not RM in that case RM will 
have less load and will be better to scale.



 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-16 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709989#comment-13709989
 ] 

Vinod Kumar Vavilapalli commented on YARN-321:
--

bq. How about starting with an in-memory implementation, which is the easiest 
to do and is useful for testing.
We can have one for testing, but a file-system based impl should exist from the 
beginning.

bq. Here's more ellaboration about the per-application data. There should be 
three objects to record: RMApp, RMAppAttempt and RMContainer. .
Tx for the detailed analysis, [~zjshen]!

bq. In addition, HistoryStorage APIs may involve a lot of I/O operations such 
that the response of an API will be long. Therefore, it is likely to be good to 
make the API non-blocking.
+1

bq. Are we moving aggregated log management(i.e deletion after expiry) 
responsibility to AHS?
bq. Right now its not clear what needs to be done for log aggregation?
bq. Sorry for misunderstanding your previous question. IMHO, in the recent 
future, we're not moving the aggregated log management, but duplicate it, which 
Both AHS and JHS can serve the same aggregated logs. However, AHS and JHS see 
the same logs from different point of views. AHS simply considers them as 
container logs, no matter what application it is, while JHS know they are the 
MR job logs. Vinod Kumar Vavilapalli, would you please confirm it?
That's an interesting point about JHS knowing that they are MR job logs. But we 
don't do anything special today. I'm thinking of just pull out log-handling and 
use it in both AHS and JHS for the time being.

bq. ResourceManager will push the data to HistoryStorage after an 
application finishes in a separate thread.
bq. Is it per application or only one thread in RM?
I foresee a single thread.

bq. Isn't it be a good idea that as soon as application starts we send the 
information to AHS and let AHS write all the data published by RM for that 
application. In that case it would be very less overhead for RM.
Like I mentioned in my proposal, this could be one implementation of 
HistoryStorage.

bq. What about in the cases where RM restart or crashes in those cases RM has 
to republish all the running applications to AHS or just forget about the 
previous running apps?
This was covered. In the first cut, we'll make best efforts. Losing an app's 
data in such a corner case is bearable.

bq. (Thinking out loud) Are we decided on who all can write to HistoryStorage?
I thought this was clear. Only RM. No per-framework data or data from AMs to 
AHS directly. Yet, at the least.

bq. I think History storage should be written by AHS not RM in that case RM 
will have less load and will be better to scale.
This is an implementation detail. Details on the impl whether the RM writes 
files directly or pushes to AHS. Let's do the APIs first.

Overall, I repeat that we should separate the design for per-framework data. 
Fundamentally because the AHS is a shared service. It needs a radically 
different design from my initial thoughts. Will file a separate JIRA and post 
my thoughts there.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-16 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709991#comment-13709991
 ] 

Vinod Kumar Vavilapalli commented on YARN-321:
--

I just created YARN-321 branch and filing sub-tasks. Let's move ahead.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-15 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709030#comment-13709030
 ] 

Mayank Bansal commented on YARN-321:


Overall Looks good, However some points to consider

ResourceManager will push the data to HistoryStorage after an application 
finishes in a separate thread.

Is it per application or only one thread in RM?

Isn't it be a good idea that as soon as application starts we send the 
information to AHS and let AHS write all the data published by RM for that 
application. In that case it would be very less overhead for RM.

What about in the cases where RM restart or crashes in those cases RM has to 
republish all the running applications to AHS or just forget about the previous 
running apps?

Right now its not clear what needs to be done for log aggregation?





 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709098#comment-13709098
 ] 

Zhijie Shen commented on YARN-321:
--

bq. Is it per application or only one thread in RM?

I think it should be one thread in RM.

bq. Isn't it be a good idea that as soon as application starts we send the 
information to AHS and let AHS write all the data published by RM for that 
application.

I'm afraid a number of metrics cannot be determined when an application has 
just been started, such as the finish time and the final status.



 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709192#comment-13709192
 ] 

Karthik Kambatla commented on YARN-321:
---

Few other considerations:

bq. Running as service: By default, ApplicationHistoryService will be embedded 
inside ResourceManager but will be independent enough to run as a separate 
service for scaling purposes.
Is there a reason to embed this inside the RM? I don't know if there were 
reasons for the JHS to be separate, other than it being MR-specific. If there 
were, this would be against those. No?
That said, I agree it will be easier for the user if AHS starts along with the 
RM. May be, that should be configurable and turned on by default? 

bq. Hosting/serving per-framework data is out of scope for this JIRA. 
Understand and agree it makes sense to not complicate it. However, during the 
design, it would be nice to outline (at least at a high-level) how the 
plugins can work. For the plugins to serve application-specific information, 
I suspect the RM should write this information in addition to generic YARN 
information about that application (e.g. MapReduce counters). On completion, 
can we leave a provision for the AM to write a json blob (may be, via RM) to 
{{HistoryStorage}}. In the AHS, can we leave a provision for app-plugins to 
access/use this information to render application specifics.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709262#comment-13709262
 ] 

Jason Lowe commented on YARN-321:
-

bq. Is there a reason to embed this inside the RM? I don't know if there were 
reasons for the JHS to be separate, other than it being MR-specific.

IIRC the history server was embedded in the JT back in 1.x and was only split 
out as a separate daemon to keep the RM from having a dependency on MR.

bq. That said, I agree it will be easier for the user if AHS starts along with 
the RM. May be, that should be configurable and turned on by default? 

That'd be my preference, and the proxyserver is already done this way.  One can 
run it either as part of the RM (default) or setup some configs and launch it 
separately via {{yarn proxyserver}}.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706673#comment-13706673
 ] 

Zhijie Shen commented on YARN-321:
--

bq. To start with, we will have an implementation with per-app HDFS file.

How about starting with an in-memory implementation, which is the easiest to do 
and is useful for testing.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706686#comment-13706686
 ] 

Zhijie Shen commented on YARN-321:
--

bq. ResoureManager will write per-application data to a (hopefully very) thin 
HistoryStorage layer.

Here's more ellaboration about the per-application data. There should be three 
objects to record: RMApp, RMAppAttempt and RMContainer. Bellow are properties 
of each object:

Completed Application:
* Application ID
* Application Name
* Application Type
* User
* Queue
* Submit Time
* Start Time
* Finish Time
* Diagnostics Info
* Final Application Status
* Num of Application Attempts

Completed Application Attempt:
* Application Attempt ID
* Application ID
* Host
* RPC Port
* Tracking URL
* Original Tracking URL (not sure it is necessary)
* Diagnostics Info
* Final Application Status
* Master Container ID
* Num of Containers

Completed Container:
* Container ID
* Application Attempt ID
* Final Container Status
* Resource
* Priority
* Node ID
* Log URL

Application has one-to-many relationship with Application Attempt, while 
Application Attempt has one-to-one relationship with Container.

WRT the concrete information to record, here's more idea about the interface of 
HistoryStorage. The follow APIs should be useful for RM to persist application 
history and for AHS to query it:
* IterableCompletedApplication getApplications([conditions...])
* CompletedApplication getApplication(ApplicationId)
* IterableCompletedApplicationAttempt getApplicationAttempts(ApplicationId)
* CompletedApplicationAttempt getApplicationAttempt(ApplicationAttemptId)
* CompletedContainer getContainer(ApplicationAttemptId)
* CompletedContainer getContainer(ContainerId)
* void addApplication(CompletedApplication)
* void addApplicationAttempt(CompletedApplicationAttempt)
* void addContainer(CompletedContainer)

In addition, HistoryStorage APIs may involve a lot of I/O operations such that 
the response of an API will be long. Therefore, it is likely to be good to make 
the API non-blocking.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706699#comment-13706699
 ] 

Devaraj K commented on YARN-321:


bq. *Aggregated logs:* Logs will be served and potentially log management 
(expiry etc.) by ApplicationHistoryService via an abstract LogService component.

How about the Job History Server which is currently handling this for MR Jobs? 
Are we moving this responsibility from JHS to AHS to handle for all types of 
applications?


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706753#comment-13706753
 ] 

Devaraj K commented on YARN-321:


I agree Zhijie Shen. 

In the above comment, I mean JHS currently supports serving Aggregated logs for 
MR jobs and managing the Aggregated logs(i.e deleting after 
'yarn.log-aggregation.retain-seconds'). Are we moving aggregated log 
management(i.e deletion after expiry) responsibility to AHS?

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706773#comment-13706773
 ] 

Zhijie Shen commented on YARN-321:
--

bq. Are we moving aggregated log management(i.e deletion after expiry) 
responsibility to AHS?

Sorry for misunderstanding your previous question. IMHO, in the recent future, 
we're not moving the aggregated log management, but duplicate it, which Both 
AHS and JHS can serve the same aggregated logs. However, AHS and JHS see the 
same logs from different point of views. AHS simply considers them as container 
logs, no matter what application it is, while JHS know they are the MR job 
logs. [~vinodkv], would you please confirm it?

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706553#comment-13706553
 ] 

Vinod Kumar Vavilapalli commented on YARN-321:
--

Fundamentally, this JIRA is to track the management of data related to finished 
applications via a new server called ApplicationHistoryService (AHS). Some 
important design points:

h4. Basics
 - ResoureManager will write per-application data to a (hopefully very) thin 
{{HistoryStorage}} layer.
 - ResourceManager will push the data to HistoryStorage after an application 
finishes in a separate thread.
 - HistoryStorage is different from the current RMStateStore and so unlike 
JobHistory, HistoryStorage isn't used for state-tracking or as a transaction 
log. ResourceManager will try to publish information about completed apps in a 
best-case manner but there will be edge cases during RM restart where we may 
not be flushing some data. Fixing it to be consistent and complete over an RM 
restart will be a future step.
 - HistoryStorage will have publish app-info, retrieve app-info and list apps 
APIs and can have various implementations
   -- A file based implementation where RM writes per-app files to DFS, 
HistoryStorage will take care of file management like we do today in 
JobHistoryServer (JHS) and serve users by reading the data in files
   -- A shared bus implementation where RM directly writes to AHS and AHS 
persists them in a storage that it controls - Files/DB etc.
 - To start with, we will have an implementation with per-app HDFS file.

h4. Miscellaneous

 - *Running as service*: By default, ApplicationHistoryService will be embedded 
inside ResourceManager but will be independent enough to run as a separate 
service for scaling purposes.

 - *User interfaces*: Command line clients and/or web-clients will have RPC and 
web and REST interfaces to interact with ApplicationHistoryService to get info 
about finished applications. Fundamentally, we'll have two types of interfaces
-- Per-app info
-- List of all apps
-- Querying list of apps based on user-name, queue-name etc. To start with, 
we will imitate what JHS does, throw up list of all apps and do the filtering 
client side. But we need a better server side solution.

 - *Aggregated logs*: Logs will be served and potentially log management 
(expiry etc.) by ApplicationHistoryService via an abstract LogService component.

 - *Retention*: ApplicationHistoryService will have components to take care of 
retention - expiring very old apps.

 - *Security*: ApplicationHistoryService will have security from start, will 
use tokens similar to JHS.

h4. Out of scope

 - Hosting/serving per-framework data is out of scope for this JIRA. It is 
related to ApplicationHistoryService but I am keeping focus on generic data for 
now on this JIRA, will file a separate ticket for ApplicationHistoryService or 
a related service to work with per-framework or app data. I see a transition 
phase where we would continue to run AHS and JHS run at the same time till the 
other JIRA is resolved.

 - *Long running services*: We won't be having any special support for long 
running services yet. We should track this with other long running services' 
support.

Feedback apprecitated.

I am going kickstarting this right now. I am creating a branch for faster 
progress. 

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-11 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706645#comment-13706645
 ] 

Hitesh Shah commented on YARN-321:
--

{quote}
To start with, we will have an implementation with per-app HDFS file.
{quote}

[~vinodkv] Based on the above, it seems like this will address allowing someone 
to analyse only one job at a time. Based on a per-app file, it will be 
non-trivial to search for applications that match a certain criteria? All jobs 
that run on a certain day? All jobs of a certain type? All jobs that took 
longer than 10 mins to run? All jobs that use over 100 containers? Sure, a 
directory hierarchy based on dates may solve the very basic use-cases but it 
looks like anyone needing to do any slightly more complex analysis on cluster 
utilization will need to build an indexing layer on top of the file-based store?




 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706654#comment-13706654
 ] 

Vinod Kumar Vavilapalli commented on YARN-321:
--

Like I mentioned:
bq. Querying list of apps based on user-name, queue-name etc. To start with, we 
will imitate what JHS does, throw up list of all apps and do the filtering 
client side. But we need a better server side solution.
So for both the CLI and web UI, we will start with a client side basic 
filtering, perhaps coupled with paging on the results. More advanced analytics 
needs a more robust server side solution. I can already imagine file-based 
indices, but a more query friendly storage will be needed - a table view via 
HCat/HBase over HDFS will be a good start.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira