[jira] [Commented] (YARN-4063) Populate the flow activity table

Vrushali C (JIRA) Wed, 19 Aug 2015 11:45:02 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703541#comment-14703541
 ]


Vrushali C commented on YARN-4063:
----------------------------------


Current line of thinking
- on application created and application finished, the start time and end time 
of the flow can be updated for that run id for that day.

- we can use coprocessors here. At compaction time, if it's towards end of the 
day and the flow record does not have an end time for this run id, we can add 
in a snapshot time to indicate that the flow is still running

- the coprocessor can also read and send back the min start time and max end 
time for that flow run id (similar to what is being done in the flow_run table).


> Populate the flow activity table
> --------------------------------
>
>                 Key: YARN-4063
>                 URL: https://issues.apache.org/jira/browse/YARN-4063
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
>
> Need to populate the flow_activity table
> -Stores per day flow run pointers and info
> -Written to by RM’s collector for application lifecycle
> primary key: cluster ! day timestamp ! user ! flow id 
> -For the day timestamp we can take the millis since epoch for the end of the 
> day (24:00h).
> columns include runids, start time, end time, snapshot time
> -This table will also be used to efficiently retrieve the flows that had an 
> activity in a certain day. That is needed for daily aggregations, but also 
> for several UIs, including a flow-based UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4063) Populate the flow activity table

Reply via email to