[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369848#comment-14369848 ] Zhijie Shen commented on YARN-3040: --- Please hold the review. Per offline discussion. For AM and NM use case, we can move the context info to the aggregator directly. I'll create a new patch soon. [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368040#comment-14368040 ] Zhijie Shen commented on YARN-3040: --- Take it over. Thanks! - Zhijie [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366692#comment-14366692 ] Naganarasimha G R commented on YARN-3040: - No issues [~zjshen], Actually had read your comment wrongly, thought you wanted Robert Kanter to start of with this issue :). If you have already started no issues will help in review and testing... [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366614#comment-14366614 ] Li Lu commented on YARN-3040: - Quick comment: In my understanding the flow based API is used in multiple components, including but not limited to event producers (like distributed shell, rm and nms), collectors (a.k.a aggregators), and storage implementations. It's not specially attached to the rm. [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366654#comment-14366654 ] Zhijie Shen commented on YARN-3040: --- [~Naganarasimha], thanks for being interested in this issue. I've already had a WIP patch. If you don't mind, may I continue the work, and would you please help to review it? [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366587#comment-14366587 ] Naganarasimha G R commented on YARN-3040: - Hi [~rkanter] and [~zjshen], Seems like scope of this jira is small and i need to make use of this in YARN-3044, so if both of you are ok would like to take this jira up. [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366338#comment-14366338 ] Zhijie Shen commented on YARN-3040: --- [~rkanter], would you mind my taking over this jira to move it forward? [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327034#comment-14327034 ] Naganarasimha G R commented on YARN-3040: - Thanks for briefing [~rkanter], and my queries or comments as follows : bq. I think the Entities (YARN-3041) are mainly for writing/reading to/from the ATS store. Most of the information stored in those Entities are not needed by the user when submitting a job. All the user really needs to set is the IDs, and some of these we can make optional or determine automatically (e.g. it's obvious which cluster it's running on) Yes i agree Flow, Cluster, Flow run not required for submitting a job and hence if we are only passing the Entity ID's then tags should be sufficient enough. But the concern what i had was based on the design doc section 7, out of scope, point 1 i am under the assumption that posting of Entities to ATSV2 can be done only by RM,NM and AM and client will not be able to post Flow, Flow run and Cluster Entities explicitly. Hence wanted to know the approach for clients to post Flow, Flow run and Cluster Entities. And wrt to Cluster info i remember Vrushali mentioning about diff clusters like production and a test cluster which they wanted to capture explicitly. bq.100 characters per tag seems like it should be enough; if not, we can maybe increase this limit? It is marked as @Evolving If we are planning to pass Entity ID's to map the application hierarchy then i feel 100 chars per tag should be sufficient. how about making it configurable if required to store more information per tag bq. For example, setFlowId(String id) would simply set the tag yes i agree that these are not first class YARN concepts hence like you mentioned YARN applications can take care of simplifying it. +1 for this approach. [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326716#comment-14326716 ] Robert Kanter commented on YARN-3040: - [~Naganarasimha] # I think the Entities (YARN-3041) are mainly for writing/reading to/from the ATS store. Most of the information stored in those Entities are not needed by the user when submitting a job. All the user really needs to set is the IDs, and some of these we can make optional or determine automatically (e.g. it's obvious which cluster it's running on) # 100 characters per tag seems like it should be enough; if not, we can maybe increase this limit? It is marked as {{@Evolving}} # Like other properties, we can add a method to JobClient or one of those classes that sets the property. For example, {{setFlowId(String id)}} would simply set the tag Flows and related constructs don't currently exist in YARN. Unless we add these as first-class concepts to the rest of YARN outside of the ATS (e.g. instead of only being able to submit YARN applications, you can also submit YARN flows; though this is looking more like Oozie...), I think tags are the only way to track this information. [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323700#comment-14323700 ] Naganarasimha G R commented on YARN-3040: - Hi [~rkanter] Some queries related to tags and this jira # IIUC, users create externally the Flow, Flowrun Entities and just give these id's as tags @ the time of app submission. so during creation of the app we ensure Hierarchies are updated properly. If my understanding is correct then whats the way user can create Flow, Flow run Cluster ? Or is it all the data related to the Flow, Flow run Cluster is passed as part of tags and if its not present we need to create the entities for them @ the time of app submission ? # Hopefully limitation of size (100 chars ) and ascii char only support only, by tags should not be a concern for passing the information to Yarn but better to capture this if we are considering tags as interface for passing flow and flow run information. # IMHO i would have liked to have explicit interface for clients to pass these information rather than tags. As even though tags might serve the purpose but doesn't seem like graceful interface for clients. [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)