subject:"\[jira\] \[Commented\] \(YARN\-321\) Generic application history service"

[jira] [Commented] (YARN-321) Generic application history service

2015-03-09 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353825#comment-14353825
 ] 

Allen Wittenauer commented on YARN-321:
---

Looks like this should get closed out w/a fix ver of 2.4.0?

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-321) Generic application history service

2014-07-28 Thread Patrick Morton (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076080#comment-14076080
]

Patrick Morton commented on YARN-321:
-

Compared to wrists well is less available status amphetamines but higher
investigations of withdrawal.
adderall 20 mg
http://www.surveyanalytics.com//userimages/sub-2/2007589/3153260/29851520/7787428-29851520-stopadd3.html

Areas also document any reasons they have surprisingly been using in the
information.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf,
Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java

The mapreduce job history server currently needs to be deployed as a trusted
server in sync with the mapreduce runtime. Every new application would need a
similar application history server. Having to deploy O(T*V) (where T is
number of type of application, V is number of version of application) trusted
servers is clearly not scalable.
Job history storage handling itself is pretty generic: move the logs and
history data into a particular directory for later serving. Job history data
is already stored as json (or binary avro). I propose that we create only one
trusted application history server, which can have a generic UI (display json
as a tree of strings) as well. Specific application/version can deploy
untrusted webapps (a la AMs) to query the application history server and
interpret the json for its specific UI and/or analytics.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-321) Generic application history service

2014-01-31 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887647#comment-13887647
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #467 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/467/])
Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562950)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-321) Generic application history service

2014-01-31 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887731#comment-13887731
 ] 

Hudson commented on YARN-321:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1684 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1684/])
Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562950)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-321) Generic application history service

2014-01-31 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887745#comment-13887745
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1659 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1659/])
Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562950)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-321) Generic application history service

2014-01-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887036#comment-13887036
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5074 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5074/])
Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562950)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-321) Generic application history service

2014-01-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882274#comment-13882274
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #462 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/462/])
YARN-1625. Fixed RAT warnings after YARN-321 merge. Contributed by Shinichi 
Yamashita. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561458)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
YARN-321. Merging YARN-321 branch to trunk.
svn merge ../branches/YARN-321 (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561452)
* /hadoop/common/trunk
* 
/hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* /hadoop/common/trunk/hadoop-mapreduce-project
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ContainerNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/application_history_client.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
*

[jira] [Commented] (YARN-321) Generic application history service

2014-01-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882297#comment-13882297
 ] 

Hudson commented on YARN-321:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1679 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1679/])
YARN-1625. Fixed RAT warnings after YARN-321 merge. Contributed by Shinichi 
Yamashita. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561458)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
YARN-321. Merging YARN-321 branch to trunk.
svn merge ../branches/YARN-321 (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561452)
* /hadoop/common/trunk
* 
/hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* /hadoop/common/trunk/hadoop-mapreduce-project
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ContainerNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/application_history_client.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
*

[jira] [Commented] (YARN-321) Generic application history service

2014-01-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882302#comment-13882302
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1654 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1654/])
YARN-1625. Fixed RAT warnings after YARN-321 merge. Contributed by Shinichi 
Yamashita. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561458)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
YARN-321. Merging YARN-321 branch to trunk.
svn merge ../branches/YARN-321 (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561452)
* /hadoop/common/trunk
* 
/hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* /hadoop/common/trunk/hadoop-mapreduce-project
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ContainerNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/application_history_client.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
*

[jira] [Commented] (YARN-321) Generic application history service

2014-01-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882176#comment-13882176
 ] 

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5039 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5039/])
YARN-321. Merging YARN-321 branch to trunk.
svn merge ../branches/YARN-321 (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561452)
* /hadoop/common/trunk
* 
/hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs
* /hadoop/common/trunk/hadoop-mapreduce-project
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ContainerNotFoundException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/application_history_client.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AHSClient.java
*

[jira] [Commented] (YARN-321) Generic application history service

2014-01-25 Thread Hudson (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882206#comment-13882206
]

Hudson commented on YARN-321:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5040 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/5040/])
YARN-1625. Fixed RAT warnings after YARN-321 merge. Contributed by Shinichi
Yamashita. (vinodkv:
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561458)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml

Generic application history service
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-321) Generic application history service

2014-01-10 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868043#comment-13868043
]

Zhijie Shen commented on YARN-321:
--

bq. 1. Does it provide a function to set maximum files and maximum retention
period of AppicationHistory to store in HDFS?

No, currently the FS implementation doesn't discard the historic data of the
applications completed before sometime, answer users' requests based on all the
stored applications. However, via REST API, users are able to filter the
applications outside a start/finish time window.

bq. 2. When there are many AppilicationHistory in HDFS, does it not limit the
number of the reading of ApplicationHistory?

As to REST API, the users are able to limit the number of applications that AHS
should return. As to HDFS access, the current implementation is going to load
all the stored applications and filtering them one-by-one, which is not a
efficient way given a big application collection. YARN-925 is reopened to
discuss pushing the filtering into the implementation of the history store,
where we can prevent loading all the applications. Meanwhile, caching
(YARN-1322) is another way to reduce I/O.

Generic application history service
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-321) Generic application history service

2014-01-07 Thread Shinichi Yamashita (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865140#comment-13865140
]

Shinichi Yamashita commented on YARN-321:
-

I confirmed attached design document. And I have two questions about
FileSystemApplicationHistoryStore.

1. Does it provide a function to set maximum files and maximum retention period
of AppicationHistory to store in HDFS?
2. When there are many AppilicationHistory in HDFS, does it not limit the
number of the reading of ApplicationHistory?

Generic application history service
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-321) Generic application history service

2013-12-23 Thread Robert Joseph Evans (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855707#comment-13855707
]

Robert Joseph Evans commented on YARN-321:
--

The way it currently works is based off of group permissions on a directory
(this is from memory from a while ago so I could be off on a few things). In
HDFS when you create a file the group of the file is the group of the directory
the file is a part of, similar to the sticky bit on a directory in Linux. When
an MR job completes it will copy it's history log file, along with a few other
files, to a drop box like location called intermediate done and atomically
rename it from a temp name to the final name. The directory is world writable,
but only readable by a special group that the history server is a part of, but
general users are not. The history server then wakes up periodically and will
scan that directory for new files, when it sees new files it will move them to
a final location that is owned by the headless history server user. If a query
comes in for a job that the history server is not aware of, it will also scan
the intermediate done directory before failing.

Reading history data is done through RPC to the history server, or through the
web interface, including RESTful APIs. There is no supported way for an app to
read history data directly though the file system. I hope this helps.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf,
Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-321) Generic application history service

2013-12-23 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856021#comment-13856021
]

Vinod Kumar Vavilapalli commented on YARN-321:
--

Tx Zhijie for answering most of Sandy's questions, you are spot on. I'll update
the design doc to clarify things where it isn't clear.

bq. What is the jira for app specific history data?
I just filed YARN-1530, will post more information soon.

bq. Could you describe the security requirements a bit further. Its not clear
to everyone how everything works currently. To be clear, what exactly needs to
be done to make apps write and read history data.
The data covered in this JIRA is generic and only RM gets to write it. The
consumers of this data are *both* the cluster admins for historical analyses as
well as individual apps that chose to not use features that come out of
YARN-1530. As such, we cannot let apps write history data.

bq. How is the shared bus different from writing to a file. I would think one
would cover the other.
Yes, writing to a file is one example of a shared-bus. I'll fix it if the doc
is confusing w.r.t this.

Generic application history service
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-321) Generic application history service

2013-12-19 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853439#comment-13853439
 ] 

Sandy Ryza commented on YARN-321:
-

Thanks Vinod for the design doc.  A few additional questions:

Will the AHS serve logs?
Is my understanding correct that for the first version, the RM will not do any 
RPCs to the AHS.
Will APIs be Public / Stable?
In the first version, what will be provided in terms of web UI?
How are ACLs enforced between the AHS and RM?
Eventually we want the AHS to interact well with long-running services, right?  
For the file-based bus, have we though about an approach for how it might store 
info for applications in non-terminal states?


 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (YARN-321) Generic application history service

2013-12-19 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853492#comment-13853492
]

Zhijie Shen commented on YARN-321:
--

AFAIK,

bq. Will the AHS serve logs?

It will serve the aggregated logs.

bq. Is my understanding correct that for the first version, the RM will not do
any RPCs to the AHS.

Yes, RM and AHS don't communicate via RPC, but the shared bus.

bq. Will APIs be Public / Stable?

Assume you're talking about RPC interface. It will be Public but Unstable.

bq. In the first version, what will be provided in terms of web UI?

* The page of a list of completed applications (FINISHED, FAILED, KILLED) in
HistoryStorage
* The page of one application and a list of its attempts
* The page of one attempt and a list of its containers
* The page of one container
* The page of the container logs

bq. How are ACLs enforced between the AHS and RM?

Is your question how the ACLs enforced on one application when it is running by
RM is inherited by AHS when the application is finished and stored in
HistoryStorage? My rough idea is to persist the ACLs info into HistoryStorage
as well. For ApplicationACLs, it seems to be much obvious. For QueueACLs, I'm
not quite sure, may be complex and depend on scheduler.

bq. Eventually we want the AHS to interact well with long-running services,
right?

I think so.

bq. For the file-based bus, have we though about an approach for how it might
store info for applications in non-terminal states?

Currently, HistoryStorage is receiving a series of events which notifies the
updates about an application/attempt/container. The info is accumulated
incrementally. IMO, the strategy can be applied to a long-running app with
defining more intermediate events (now we only defines start and finish events)
to report the progress. Then, instead of holding the events until app gets
finished, we need to flush them in advance/periodically/..., such that users
can get informed of latest progress of the app.

Hopefully the answers can address you questions.:-)

Generic application history service
---

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (YARN-321) Generic application history service

2013-12-03 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838096#comment-13838096
]

Zhijie Shen commented on YARN-321:
--

bq. Will this jira been included in the next release ?

Yes, AHS is included in the plan of 2.4

bq. Another question about this jira. I found that the container logURL is
hard-coded there, user still could not see the logs of each container ( stdout,
stderror ). Is it on the roadmap that allow user to see the logs ? And which
jira is tracking this ?

Are you talking about showing aggregated logs via web interfaces? If it is, you
may want to keep an eye on YARN-1413, which is tracking this issue.

Generic application history service
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-321) Generic application history service

2013-12-02 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837276#comment-13837276
 ] 

Jeff Zhang commented on YARN-321:
-

Will this jira been included in the next release ?

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-321) Generic application history service

2013-12-02 Thread Jeff Zhang (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837281#comment-13837281
]

Jeff Zhang commented on YARN-321:
-

Another question about this jira. I found that the container logURL is
hard-coded there, user still could not see the logs of each container ( stdout,
stderror ). Is it on the roadmap that allow user to see the logs ? And which
jira is tracking this ? Thanks .

Generic application history service
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-321) Generic application history service

2013-10-09 Thread Robert Joseph Evans (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790555#comment-13790555
]

Robert Joseph Evans commented on YARN-321:
--

I like the diagrams, but I want to understand if the generic application
history service is intended to replace the job history server, or to just
augment it?

I would prefer it if we could replace the current server. Perhaps not in the
first release, but eventually. To make that work we would have to provide a
way for MR specific code to come up and run inside the service, exposing both
the current restful web service, an application specific UI, and the RPC server
that we currently run.

Generic application history service
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-321) Generic application history service

2013-10-09 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790604#comment-13790604
]

Zhijie Shen commented on YARN-321:
--

bq. I like the diagrams, but I want to understand if the generic application
history service is intended to replace the job history server, or to just
augment it?

Yes, there's some previous discussion on recording the per application type
history data, but we plan to exclude it from the initial version of AHS.
Eventually, we'd like to integrate JHS into AHS in some way, the details of
which we could discuss in the follow jiras. To me, it would be better if we can
design a common per application type plugin framework, such that we can easily
integrate the HS of other applications on YARN.

Generic application history service
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-321) Generic application history service

2013-10-08 Thread Sandy Ryza (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789876#comment-13789876
]

Sandy Ryza commented on YARN-321:
-

Was a design doc ever written up for this? The HistoryStorageDemo.java is a
good start for understanding some of the interfaces, but it would be helpful to
have something that explains things like what the Application History Service's
role is, how it interacts with the RM, and key differences and similarities
with the Job History Server.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
Attachments: HistoryStorageDemo.java

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-321) Generic application history service

2013-10-08 Thread Sandy Ryza (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790047#comment-13790047
]

Sandy Ryza commented on YARN-321:
-

Thanks Vinod and Zhijie. Didn't see the comment. I'm going to attach your
outline as a pdf to make it a little easier for passers-by to learn about.
Here's the google doc it came from if you want to edit:
https://docs.google.com/document/d/1cNsdGyLuagR8lzfeQrAclOAd-AdkVwgST6OG8Zzp43M/edit#heading=h.15p1lkmmm9g8

Generic application history service
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-321) Generic application history service

2013-07-18 Thread Karthik Kambatla (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712546#comment-13712546
]

Karthik Kambatla commented on YARN-321:
---

bq. Folks, it would be great if we have a consolidated document that describes
the design and some details.
+1

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
Attachments: HistoryStorageDemo.java

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-321) Generic application history service

2013-07-18 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712796#comment-13712796
]

Zhijie Shen commented on YARN-321:
--

bq. Running as service: By default, ApplicationHistoryService will be embedded
inside ResourceManager but will be independent enough to run as a separate
service for scaling purposes.

IIUC, to be independent, ApplicationHistoryService should have its own event
dispatcher, shouldn't it?

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
Attachments: HistoryStorageDemo.java

[jira] [Commented] (YARN-321) Generic application history service

2013-07-17 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711345#comment-13711345
]

Bikas Saha commented on YARN-321:
-

Folks, it would be great if we have a consolidated document that describes the
design and some details. We can keep posting revisions of the document to this
jira as things changes significantly. HDFS-2802 is an example. Its a little
difficult to piece it together from multiple comments and clarifications.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
Attachments: HistoryStorageDemo.java

[jira] [Commented] (YARN-321) Generic application history service

2013-07-16 Thread Karthik Kambatla (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709940#comment-13709940
]

Karthik Kambatla commented on YARN-321:
---

The approach in HistoryStorageDemo looks good, like the fact that the schema
goes into the tuple.

(Thinking out loud) Are we decided on who all can write to HistoryStorage?
# Single writer - RM? If so, the AMs should pass all the information to the RM.
Need to carefully handle the HA scenarios, may be as part of the HA work.
# Both AMs and RM write to HistoryStorage directly? Writes synchronized at the
tuple level? I am thinking of long-running services here - they might want
AM/RM to write tuples every so often.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
Attachments: HistoryStorageDemo.java

[jira] [Commented] (YARN-321) Generic application history service

2013-07-16 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709966#comment-13709966
]

Mayank Bansal commented on YARN-321:

I think History storage should be written by AHS not RM in that case RM will
have less load and will be better to scale.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
Attachments: HistoryStorageDemo.java

[jira] [Commented] (YARN-321) Generic application history service

2013-07-16 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709989#comment-13709989
]

Vinod Kumar Vavilapalli commented on YARN-321:
--

bq. How about starting with an in-memory implementation, which is the easiest
to do and is useful for testing.
We can have one for testing, but a file-system based impl should exist from the
beginning.

bq. Here's more ellaboration about the per-application data. There should be
three objects to record: RMApp, RMAppAttempt and RMContainer. .
Tx for the detailed analysis, [~zjshen]!

bq. In addition, HistoryStorage APIs may involve a lot of I/O operations such
that the response of an API will be long. Therefore, it is likely to be good to
make the API non-blocking.
+1

bq. Are we moving aggregated log management(i.e deletion after expiry)
responsibility to AHS?
bq. Right now its not clear what needs to be done for log aggregation?
bq. Sorry for misunderstanding your previous question. IMHO, in the recent
future, we're not moving the aggregated log management, but duplicate it, which
Both AHS and JHS can serve the same aggregated logs. However, AHS and JHS see
the same logs from different point of views. AHS simply considers them as
container logs, no matter what application it is, while JHS know they are the
MR job logs. Vinod Kumar Vavilapalli, would you please confirm it?
That's an interesting point about JHS knowing that they are MR job logs. But we
don't do anything special today. I'm thinking of just pull out log-handling and
use it in both AHS and JHS for the time being.

bq. ResourceManager will push the data to HistoryStorage after an
application finishes in a separate thread.
bq. Is it per application or only one thread in RM?
I foresee a single thread.

bq. Isn't it be a good idea that as soon as application starts we send the
information to AHS and let AHS write all the data published by RM for that
application. In that case it would be very less overhead for RM.
Like I mentioned in my proposal, this could be one implementation of
HistoryStorage.

bq. What about in the cases where RM restart or crashes in those cases RM has
to republish all the running applications to AHS or just forget about the
previous running apps?
This was covered. In the first cut, we'll make best efforts. Losing an app's
data in such a corner case is bearable.

bq. (Thinking out loud) Are we decided on who all can write to HistoryStorage?
I thought this was clear. Only RM. No per-framework data or data from AMs to
AHS directly. Yet, at the least.

bq. I think History storage should be written by AHS not RM in that case RM
will have less load and will be better to scale.
This is an implementation detail. Details on the impl whether the RM writes
files directly or pushes to AHS. Let's do the APIs first.

Overall, I repeat that we should separate the design for per-framework data.
Fundamentally because the AHS is a shared service. It needs a radically
different design from my initial thoughts. Will file a separate JIRA and post
my thoughts there.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
Attachments: HistoryStorageDemo.java

[jira] [Commented] (YARN-321) Generic application history service

2013-07-16 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709991#comment-13709991
]

Vinod Kumar Vavilapalli commented on YARN-321:
--

I just created YARN-321 branch and filing sub-tasks. Let's move ahead.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
Attachments: HistoryStorageDemo.java

[jira] [Commented] (YARN-321) Generic application history service

2013-07-15 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709030#comment-13709030
]

Mayank Bansal commented on YARN-321:

Overall Looks good, However some points to consider

ResourceManager will push the data to HistoryStorage after an application
finishes in a separate thread.

Is it per application or only one thread in RM?

Isn't it be a good idea that as soon as application starts we send the
information to AHS and let AHS write all the data published by RM for that
application. In that case it would be very less overhead for RM.

What about in the cases where RM restart or crashes in those cases RM has to
republish all the running applications to AHS or just forget about the previous
running apps?

Right now its not clear what needs to be done for log aggregation?

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-321) Generic application history service

2013-07-15 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709098#comment-13709098
]

Zhijie Shen commented on YARN-321:
--

bq. Is it per application or only one thread in RM?

I think it should be one thread in RM.

bq. Isn't it be a good idea that as soon as application starts we send the
information to AHS and let AHS write all the data published by RM for that
application.

I'm afraid a number of metrics cannot be determined when an application has
just been started, such as the finish time and the final status.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-321) Generic application history service

2013-07-15 Thread Karthik Kambatla (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709192#comment-13709192
]

Karthik Kambatla commented on YARN-321:
---

Few other considerations:

bq. Running as service: By default, ApplicationHistoryService will be embedded
inside ResourceManager but will be independent enough to run as a separate
service for scaling purposes.
Is there a reason to embed this inside the RM? I don't know if there were
reasons for the JHS to be separate, other than it being MR-specific. If there
were, this would be against those. No?
That said, I agree it will be easier for the user if AHS starts along with the
RM. May be, that should be configurable and turned on by default?

bq. Hosting/serving per-framework data is out of scope for this JIRA.
Understand and agree it makes sense to not complicate it. However, during the
design, it would be nice to outline (at least at a high-level) how the
plugins can work. For the plugins to serve application-specific information,
I suspect the RM should write this information in addition to generic YARN
information about that application (e.g. MapReduce counters). On completion,
can we leave a provision for the AM to write a json blob (may be, via RM) to
{{HistoryStorage}}. In the AHS, can we leave a provision for app-plugins to
access/use this information to render application specifics.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-321) Generic application history service

2013-07-15 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709262#comment-13709262
]

Jason Lowe commented on YARN-321:
-

bq. Is there a reason to embed this inside the RM? I don't know if there were
reasons for the JHS to be separate, other than it being MR-specific.

IIRC the history server was embedded in the JT back in 1.x and was only split
out as a separate daemon to keep the RM from having a dependency on MR.

bq. That said, I agree it will be easier for the user if AHS starts along with
the RM. May be, that should be configurable and turned on by default?

That'd be my preference, and the proxyserver is already done this way. One can
run it either as part of the RM (default) or setup some configs and launch it
separately via {{yarn proxyserver}}.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706673#comment-13706673
]

Zhijie Shen commented on YARN-321:
--

bq. To start with, we will have an implementation with per-app HDFS file.

How about starting with an in-memory implementation, which is the easiest to do
and is useful for testing.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706686#comment-13706686
 ] 

Zhijie Shen commented on YARN-321:
--

bq. ResoureManager will write per-application data to a (hopefully very) thin 
HistoryStorage layer.

Here's more ellaboration about the per-application data. There should be three 
objects to record: RMApp, RMAppAttempt and RMContainer. Bellow are properties 
of each object:

Completed Application:
* Application ID
* Application Name
* Application Type
* User
* Queue
* Submit Time
* Start Time
* Finish Time
* Diagnostics Info
* Final Application Status
* Num of Application Attempts

Completed Application Attempt:
* Application Attempt ID
* Application ID
* Host
* RPC Port
* Tracking URL
* Original Tracking URL (not sure it is necessary)
* Diagnostics Info
* Final Application Status
* Master Container ID
* Num of Containers

Completed Container:
* Container ID
* Application Attempt ID
* Final Container Status
* Resource
* Priority
* Node ID
* Log URL

Application has one-to-many relationship with Application Attempt, while 
Application Attempt has one-to-one relationship with Container.

WRT the concrete information to record, here's more idea about the interface of 
HistoryStorage. The follow APIs should be useful for RM to persist application 
history and for AHS to query it:
* IterableCompletedApplication getApplications([conditions...])
* CompletedApplication getApplication(ApplicationId)
* IterableCompletedApplicationAttempt getApplicationAttempts(ApplicationId)
* CompletedApplicationAttempt getApplicationAttempt(ApplicationAttemptId)
* CompletedContainer getContainer(ApplicationAttemptId)
* CompletedContainer getContainer(ContainerId)
* void addApplication(CompletedApplication)
* void addApplicationAttempt(CompletedApplicationAttempt)
* void addContainer(CompletedContainer)

In addition, HistoryStorage APIs may involve a lot of I/O operations such that 
the response of an API will be long. Therefore, it is likely to be good to make 
the API non-blocking.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Devaraj K (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706699#comment-13706699
]

Devaraj K commented on YARN-321:

bq. *Aggregated logs:* Logs will be served and potentially log management
(expiry etc.) by ApplicationHistoryService via an abstract LogService component.

How about the Job History Server which is currently handling this for MR Jobs?
Are we moving this responsibility from JHS to AHS to handle for all types of
applications?

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Devaraj K (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706753#comment-13706753
]

Devaraj K commented on YARN-321:

I agree Zhijie Shen.

In the above comment, I mean JHS currently supports serving Aggregated logs for
MR jobs and managing the Aggregated logs(i.e deleting after
'yarn.log-aggregation.retain-seconds'). Are we moving aggregated log
management(i.e deletion after expiry) responsibility to AHS?

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706773#comment-13706773
]

Zhijie Shen commented on YARN-321:
--

bq. Are we moving aggregated log management(i.e deletion after expiry)
responsibility to AHS?

Sorry for misunderstanding your previous question. IMHO, in the recent future,
we're not moving the aggregated log management, but duplicate it, which Both
AHS and JHS can serve the same aggregated logs. However, AHS and JHS see the
same logs from different point of views. AHS simply considers them as container
logs, no matter what application it is, while JHS know they are the MR job
logs. [~vinodkv], would you please confirm it?

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-321) Generic application history service

2013-07-11 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706553#comment-13706553
]

Vinod Kumar Vavilapalli commented on YARN-321:
--

Fundamentally, this JIRA is to track the management of data related to finished
applications via a new server called ApplicationHistoryService (AHS). Some
important design points:

h4. Basics
- ResoureManager will write per-application data to a (hopefully very) thin
{{HistoryStorage}} layer.
- ResourceManager will push the data to HistoryStorage after an application
finishes in a separate thread.
- HistoryStorage is different from the current RMStateStore and so unlike
JobHistory, HistoryStorage isn't used for state-tracking or as a transaction
log. ResourceManager will try to publish information about completed apps in a
best-case manner but there will be edge cases during RM restart where we may
not be flushing some data. Fixing it to be consistent and complete over an RM
restart will be a future step.
- HistoryStorage will have publish app-info, retrieve app-info and list apps
APIs and can have various implementations
-- A file based implementation where RM writes per-app files to DFS,
HistoryStorage will take care of file management like we do today in
JobHistoryServer (JHS) and serve users by reading the data in files
-- A shared bus implementation where RM directly writes to AHS and AHS
persists them in a storage that it controls - Files/DB etc.
- To start with, we will have an implementation with per-app HDFS file.

h4. Miscellaneous

- *Running as service*: By default, ApplicationHistoryService will be embedded
inside ResourceManager but will be independent enough to run as a separate
service for scaling purposes.

- *User interfaces*: Command line clients and/or web-clients will have RPC and
web and REST interfaces to interact with ApplicationHistoryService to get info
about finished applications. Fundamentally, we'll have two types of interfaces
-- Per-app info
-- List of all apps
-- Querying list of apps based on user-name, queue-name etc. To start with,
we will imitate what JHS does, throw up list of all apps and do the filtering
client side. But we need a better server side solution.

- *Aggregated logs*: Logs will be served and potentially log management
(expiry etc.) by ApplicationHistoryService via an abstract LogService component.

- *Retention*: ApplicationHistoryService will have components to take care of
retention - expiring very old apps.

- *Security*: ApplicationHistoryService will have security from start, will
use tokens similar to JHS.

h4. Out of scope

- Hosting/serving per-framework data is out of scope for this JIRA. It is
related to ApplicationHistoryService but I am keeping focus on generic data for
now on this JIRA, will file a separate ticket for ApplicationHistoryService or
a related service to work with per-framework or app data. I see a transition
phase where we would continue to run AHS and JHS run at the same time till the
other JIRA is resolved.

- *Long running services*: We won't be having any special support for long
running services yet. We should track this with other long running services'
support.

Feedback apprecitated.

I am going kickstarting this right now. I am creating a branch for faster
progress.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-321) Generic application history service

2013-07-11 Thread Hitesh Shah (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706645#comment-13706645
]

Hitesh Shah commented on YARN-321:
--

{quote}
To start with, we will have an implementation with per-app HDFS file.
{quote}

[~vinodkv] Based on the above, it seems like this will address allowing someone
to analyse only one job at a time. Based on a per-app file, it will be
non-trivial to search for applications that match a certain criteria? All jobs
that run on a certain day? All jobs of a certain type? All jobs that took
longer than 10 mins to run? All jobs that use over 100 containers? Sure, a
directory hierarchy based on dates may solve the very basic use-cases but it
looks like anyone needing to do any slightly more complex analysis on cluster
utilization will need to build an indexing layer on top of the file-based store?

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-321) Generic application history service

2013-07-11 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706654#comment-13706654
]

Vinod Kumar Vavilapalli commented on YARN-321:
--

Like I mentioned:
bq. Querying list of apps based on user-name, queue-name etc. To start with, we
will imitate what JHS does, throw up list of all apps and do the filtering
client side. But we need a better server side solution.
So for both the CLI and web UI, we will start with a client side basic
filtering, perhaps coupled with paging on the results. More advanced analytics
needs a more robust server side solution. I can already imagine file-based
indices, but a more query friendly storage will be needed - a table view via
HCat/HBase over HDFS will be a good start.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

43 matches

Mail list logo