[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612592#comment-14612592 ]
Sangjin Lee commented on YARN-3815: ----------------------------------- Here is my take on what's consensus, what's not, and what's currently out of scope. I may have misread the discussion and your impression/understanding may be different, so please feel free to chime in and comment on this! (consensus or not controversial) - applications table will be split from the main entities table - app-level aggregation for framework-specific metrics will be done by the AM - app-level aggregation for YARN-system container metrics will be done by the per-app timeline collector - real-time aggregation does simple sum for all types of metrics - metrics API will be updated to differentiate gauges and counters (the type information will need to be persisted in the storage) - for gauges, in addition to the simple sum-based aggregation, support average and max - the flow-run table will be created to handle app-to-flow-run ("real-time") aggregation as proposed in the native HBase schema design - auxiliary tables will be implemented as proposed in the native HBase schema design - time-based aggregation (daily, weekly, monthly, etc.) will be done via phoenix tables to enable ad-hoc queries (questions remaining or undecided) - for the average/max support for gauges (see above), confirm that's exactly what we want to support - how to implement app-to-flow-run aggregation for gauges - how to perform the time-based aggregation (mapreduce, using co-processor endpoints, etc.) - how to handle long-running apps for time-based aggregation - considering adopting "null delimiters" (or other phoenix-friendly tools) to support phoenix reading data from the native HBase tables - using flow collectors, user collectors, and queue collectors as means of performing (higher-level) aggregation (out of scope) - support per-container averages for gauges - any aggregation other than time-based aggregation for flows, users, and queues - creating a dependency on the explicit YARN flow API > [Aggregation] Application/Flow/User/Queue Level Aggregations > ------------------------------------------------------------ > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Junping Du > Assignee: Junping Du > Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf, aggregation-design-discussion.pdf, > hbase-schema-proposal-for-aggregation.pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)