[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhijie Shen updated YARN-3041: ------------------------------ Attachment: YARN-3041.5.patch Thanks for the feedback, Sangjin, Vrushali and Joep! We had an offline discussion. I updated the patch according to it. Here's the summary of the major changes: 1. It is necessary to have both Flow and FlowRun in the taxonomy, as the concepts of them are most the same. FlowRun is more likely to model an individual flow instance of a number applications while Flow sounds like a the generic perspective of application organization, which may be nested multiple FlowRun instances. Hence, we just need to have FlowRun only, but rename FlowRun to Flow for simplicity. 2. To address the aggregation interval, which means we may want to query the aggregated information for a particular time window, I change TimelineMetric to have starttime and endtime attributes. 3. The types of the first class citizen entities are defined centrally as the enums, and the parent-child relationship is defined there too. 4. In the write path, queue is the string attribute of application while user is the string attribute of the flow, while we still have the entities of both to put the aggregated data at the reader side. One additional implication is that all the applications are going to be run by the same user of the parent flow. 5. Flow id is the composite: user@flow_name(or id)/version/run, which will uniquely identify a flow in the storage. Joep has raised a great point of keeping the type generic to extend the data model beyond YARN, such as Mesos. I think we can think and discuss more around it, but let's file a separate Jira to tackle this direction. Here, as mentioned above, let's try to get the first draft of data model in asap to unblock the aggregator and the reader work. Hopefully it makes sense to the folks here. > [Data Model] create overall data objects of TS next gen > ------------------------------------------------------- > > Key: YARN-3041 > URL: https://issues.apache.org/jira/browse/YARN-3041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Sangjin Lee > Assignee: Zhijie Shen > Attachments: Data_model_proposal_v2.pdf, YARN-3041.2.patch, > YARN-3041.3.patch, YARN-3041.4.patch, YARN-3041.5.patch, > YARN-3041.preliminary.001.patch > > > Per design in YARN-2928, create the ATS entity and events API. > Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, > flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)