[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371899#comment-14371899 ]
Sangjin Lee commented on YARN-3040: ----------------------------------- Hi [~zjshen], thanks much for working on this. I just took a quick look at the patch and the discussion. It seems like you'll update it soon, but I'll pass along my comments just in case. One high level comment: the original intent of this JIRA is more of an end-to-end flow of the flow information (flow name, flow version, and flow run id). How can individual frameworks (MR, tez, ...) set these attributes and pass them to the RM at the time of the application launch? How does that information get passed to the TimelineClient and to the timeline collector? We do need the API from the beginning portion of the end-to-end picture as well. bq. new TimelineClient is constructed per application, and in the context of one application, we can reasonably assume this context information should be unchanged. There are a couple of things to consider here (and it sounds like that may be part of the offline discussion). We need to make sure we handle the case of NM's writing container-related info. It sounds like each NM will need to have multiple timeline clients (one for each application). More importantly, we need to think about the RM use case. The RM will have its own collector, and it does not go through the TimelineClient API. How would that work? More individual comments: - flowId should be flowName (that's the standard terminology we're using) - flow version seems to be missing from this; while flow version is not part of the primary key of the entity, it is a necessary attribute - I think flow run id can (and should) be a long; it doesn't have to be a generic string - in light of this, it might be slightly better to have a (flow) context API rather than individual arguments where you can set all these flow-related attributes - the default cluster id should be just the cluster name; I'm not sure why we need to add the cluster start timestamp; it would mean that every restart of the resource manager would create a new logical cluster in the timeline service; I'm not sure I agree with that - hopefully isUnitTest can be removed with the changes I made in the previous commit > [Data Model] Make putEntities operation be aware of the app's context > --------------------------------------------------------------------- > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Sangjin Lee > Assignee: Zhijie Shen > Attachments: YARN-3040.1.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)