[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-03-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369848#comment-14369848
 ] 

Zhijie Shen commented on YARN-3040:
---

Please hold the review. Per offline discussion. For AM and NM use case, we can 
move the context info to the aggregator directly. I'll create a new patch soon.

 [Data Model] Implement client-side API for handling flows
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3040.1.patch


 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-03-18 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368040#comment-14368040
 ] 

Zhijie Shen commented on YARN-3040:
---

Take it over. Thanks! - Zhijie

 [Data Model] Implement client-side API for handling flows
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3040.1.patch


 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-03-18 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366692#comment-14366692
 ] 

Naganarasimha G R commented on YARN-3040:
-

No issues [~zjshen], Actually had read your comment wrongly, thought you wanted 
Robert Kanter to start of with this issue :).
If you have already started no issues will help in review and testing...

 [Data Model] Implement client-side API for handling flows
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-03-17 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366614#comment-14366614
 ] 

Li Lu commented on YARN-3040:
-

Quick comment: In my understanding the flow based API is used in multiple 
components, including but not limited to event producers (like distributed 
shell, rm and nms), collectors (a.k.a aggregators), and storage 
implementations. It's not specially attached to the rm. 

 [Data Model] Implement client-side API for handling flows
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-03-17 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366654#comment-14366654
 ] 

Zhijie Shen commented on YARN-3040:
---

[~Naganarasimha], thanks for being interested in this issue. I've already had a 
WIP patch. If you don't mind, may I continue the work, and would you please 
help to review it?

 [Data Model] Implement client-side API for handling flows
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-03-17 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366587#comment-14366587
 ] 

Naganarasimha G R commented on YARN-3040:
-

Hi [~rkanter] and [~zjshen], 
Seems like scope of this jira is small and i need to make use of this in 
YARN-3044, so if both of you are ok would like to take this jira up.

 [Data Model] Implement client-side API for handling flows
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-03-17 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366338#comment-14366338
 ] 

Zhijie Shen commented on YARN-3040:
---

[~rkanter], would you mind my taking over this jira to move it forward?

 [Data Model] Implement client-side API for handling flows
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-02-18 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327034#comment-14327034
 ] 

Naganarasimha G R commented on YARN-3040:
-

Thanks for briefing  [~rkanter], and my queries or comments as follows :
bq. I think the Entities (YARN-3041) are mainly for writing/reading to/from the 
ATS store. Most of the information stored in those Entities are not needed by 
the user when submitting a job. All the user really needs to set is the IDs, 
and some of these we can make optional or determine automatically (e.g. it's 
obvious which cluster it's running on)
Yes i agree Flow, Cluster, Flow run  not required for submitting a job and 
hence if we are only passing the Entity ID's then tags should be sufficient  
enough. But the concern what i  had was based on the design doc section 7, out 
of scope, point 1 i am under the assumption that posting of Entities to ATSV2 
can be done only by RM,NM and AM and client will not be able to post Flow, Flow 
run and Cluster Entities explicitly. Hence wanted to know the approach for 
clients to post  Flow, Flow run and Cluster Entities. And wrt to Cluster info i 
remember Vrushali mentioning about diff clusters like production and a test 
cluster which they wanted to capture explicitly.
bq.100 characters per tag seems like it should be enough; if not, we can maybe 
increase this limit? It is marked as @Evolving
If we are planning to pass Entity ID's to map the application hierarchy then i 
feel 100 chars per tag should be sufficient. how about making it configurable 
if required to store more information per tag
bq. For example, setFlowId(String id) would simply set the tag
yes i agree that these are not first class YARN  concepts hence like you 
mentioned YARN applications can take care of simplifying it. +1 for this 
approach.


 [Data Model] Implement client-side API for handling flows
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-02-18 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326716#comment-14326716
 ] 

Robert Kanter commented on YARN-3040:
-

[~Naganarasimha]
# I think the Entities (YARN-3041) are mainly for writing/reading to/from the 
ATS store.  Most of the information stored in those Entities are not needed by 
the user when submitting a job.  All the user really needs to set is the IDs, 
and some of these we can make optional or determine automatically (e.g. it's 
obvious which cluster it's running on)
# 100 characters per tag seems like it should be enough; if not, we can maybe 
increase this limit?  It is marked as {{@Evolving}}
# Like other properties, we can add a method to JobClient or one of those 
classes that sets the property.  For example, {{setFlowId(String id)}} would 
simply set the tag

Flows and related constructs don't currently exist in YARN.  Unless we add 
these as first-class concepts to the rest of YARN outside of the ATS (e.g. 
instead of only being able to submit YARN applications, you can also submit 
YARN flows; though this is looking more like Oozie...), I think tags are the 
only way to track this information.

 [Data Model] Implement client-side API for handling flows
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-02-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323700#comment-14323700
 ] 

Naganarasimha G R commented on YARN-3040:
-

Hi [~rkanter]
Some queries related to tags and this jira
# IIUC, users create externally the Flow, Flowrun Entities and just give these 
id's as tags @ the time of app submission. so during creation of the app we 
ensure Hierarchies are updated properly. If my understanding is correct then 
whats the way user can create Flow, Flow run  Cluster ?
Or is it all the data related to the Flow, Flow run  Cluster is passed as part 
of tags and if its not present we need to create the  entities for them  @ the 
time of app submission ?
# Hopefully limitation of size (100 chars ) and ascii char only support only, 
by tags should not be a concern for passing the information to Yarn but better 
to capture this if we are considering tags as interface for passing flow and 
flow run information.
# IMHO i would have liked to have explicit interface for clients to pass these 
information rather than tags. As even though tags might serve the purpose but 
doesn't seem like graceful interface for clients.

 [Data Model] Implement client-side API for handling flows
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)