[ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370898#comment-16370898
 ] 

Xuan Gong commented on YARN-7952:
---------------------------------

Right now, the NM would send its own log aggregation status to RM periodically 
to RM. And RM would aggregate the status for each application, but it will not 
generate the final status until a client call(from web ui or cli) trigger it. 
But RM never persists the log aggregation status. So, when RM restarts/fails 
over, the log aggregation status will become “NOT_STARTED”. This is confusing, 
maybe we should change it to “NOT_AVAILABLE” (will create a separate ticket for 
this). Anyway, we need to persist the log aggregation status for the future use.

Option one:  the centralized approach.

Create a new service called LogAggregationTrackingService in RM which will 
track the log aggregation status for all applications. We can also introduce 
“EXPIRY_INTERVAL_MS”. The service can wake up periodically to check the log 
aggregation progress. This log aggregationTrackingService will be similar to a 
LivenessMonitor(such as AMLivenessMonitor). After EXPIRY_INTERVAL_MS, the 
service would trigger an update RMStateStore event to persist the final log 
aggregation status. So, we need to add one more RMStateStore event for every 
application. Also, when RM restart/fail-over happens between the 
EXPIRY_INTERVAL_MS, we still lose the log aggregation status.

Option two: only care about log aggregation status for the latest applications.

This approach will not persist the log aggregation status, so we will not need 
to trigger a new RMStateStore event. When NM sends the log aggregation status 
to RM, it will save a copy in its own memory(do we need to persist in NM state 
store ???). We also introduce “EXPIRY_INTERVAL_MS”. When RM restarts/fails 
over, NM would do re-register to RM. At this time, NM would send the previous 
copy of the log aggregation status to RM based on the configured 
“EXPIRY_INTERVAL_MS” (current_timestamp-last_updated_timestamp <= 
EXPIRY_INTERVAL_MS). So, the RM could re-generate the log aggregation status. 
Most of the changes will happen on NM side. 

Option three: Option one + Option two

> Find a way to persist the log aggregation status
> ------------------------------------------------
>
>                 Key: YARN-7952
>                 URL: https://issues.apache.org/jira/browse/YARN-7952
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>            Priority: Major
>
> In MAPREDUCE-6415, we have created a CLI to har the aggregated logs, and In 
> YARN-4946: RM should write out Aggregated Log Completion file flag next to 
> logs, we have a discussion on how we can get the log aggregation status: make 
> a client call to RM or get it directly from the Distributed file system(HDFS).
> No matter which approach we would like to choose, we need to figure out a way 
> to persist the log aggregation status first. This ticket is used to track the 
> working progress for this purpose.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to