[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720392#comment-14720392
 ] 

Vrushali C commented on YARN-3901:
----------------------------------

Hi [~gtCarrera9]

Thanks for the first review pass! To answer your questions:
bq.  IIUC, we now directly write data into the flow run related tables upon 
application start, finish, and periodic flush, and we only perform the 
aggregations in our coprocessors
Yes, data is written to flow run and flow activity tables in a quick simple 
write but the correct values to be returned are determined at read time AND 
(TBD) at flush/compaction time. During flush/compaction, the data from various 
cells will be 'merged' into fewer number of cells so that next read calls are 
faster.

bq.  How are those coprocessor connected. Is it through an Hbase configuration 
externally, or there're some lines set them up in this patch that I missed 
(which is quite possible)?

During table creation time, we specify the coprocessor class. This can also be 
done later by alter table command as desired.

bq. I noticed you're performing aggregation work in the coprocessor 
(FlowScanner), this is slightly different to the approach in YARN-3816 (app 
level aggregation). My hunch is that we may need some sort of common APIs for 
aggregating metrics, so that we can centralize the aggregation logic? Or, why 
is the flow run level aggregation significantly different to app level 
aggregation (so that we cannot share the same aggregation logic)?

There are some differences between the two aggregations, I think. Not sure if 
the classes can be reused without complicating development efforts. For the PoC 
I would like to focus on these tables independently. We could file follow up 
jiras to refactor the code as we see fit when the whole picture emerges, does 
that sound good?

Keep the questions coming, thanks!


> Populate flow run data in the flow_run table
> --------------------------------------------
>
>                 Key: YARN-3901
>                 URL: https://issues.apache.org/jira/browse/YARN-3901
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
>         Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to