[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390139#comment-15390139
 ] 

Chen Ge commented on YARN-4091:
-------------------------------

Thanks [~sunilg] for comments and improvements. Here are corresponding 
modifications and some further comments.

For comment 1, I think multiple node heartbeats will not be invoked at the same 
time. They happen sequentially, so {{startNodeUpdateRecording}} will not be 
visited by two node heartbeats at the same time. There is no need to 
synchronize it.

For comment 2, {{activeRecordedNodes}} and {{recordingNodesAllocation}} are 
both to ensure recording a complete node update after request. 
{{recordingNodesAllocation}} puts the recorded node once 
{{activeRecordedNodes}} contains that node in {{startNodeUpdateRecording}}. 
Node adds to {{activeRecordedNodes}} once user requests it. If we avoid 
{{activeRecordedNodes}}, we may begin to record activity even at the middle of 
a node heartbeat. It is necessary to use {{activeRecordedNodes}} to wait until 
next node heartbeat.

We have addressed comment 3, 4, 5, 7 based on suggestions.

For comment 6, we have added a new intermediate util class called 
{{ActivitiesLogger}}. The operations there are classified into three classes: 
APP, QUEUE and NODE. They handle "start", "add" or "finish" operations from 
APP, QUEUE and NODE perspectives. Within CapacityScheduler, Queue or 
ContainerAllocator, it simply calls the helper functions in 
{{ActivitiesLogger}}. {{ActivitiesLogger}} will invoke the specific operations 
in {{ActivitiesManager}}.

Also for comment 8, we have made the activities API simpler. We delete the 
updateState operation and just keep startRecording, addActivity, 
finishNodeAllocation and finishRecording. We combine similar calls and optimize 
passed parameters as clean as possible.

As for minor nits, we change the function name as suggested.

Thanks again for the valuable comments.

> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Chen Ge
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf, 
> YARN-4091-design-doc-v1.pdf, YARN-4091.1.patch, YARN-4091.preliminary.1.patch
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to