[ 
https://issues.apache.org/jira/browse/YARN-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857340#comment-15857340
 ] 

Wilfred Spiegelenburg commented on YARN-6042:
---------------------------------------------

I looked at the changes and it will help debugging the FS a lot when we get 
this into a release

A couple of things:
# In the FairScheduler change you add a new method {{dumpSchedulerState()}} why 
are you not passing in the rootQueue to the method? It safes getting it again 
since you have already got it the update method.
# I am missing one number for the applications in the {{dumpStateInternal()}} 
for the FSLeafQueue: {{getNumPendingApps()}} or {{getNumActiveApps()}}. We need 
to have one of those to have a full view of what the application state is in 
the queue.
# We add the LastTimeAtMinShare but not the LastTimeAtFairShare for the leaf 
queue as per: {{getLastTimeAtFairShareThreshold()}}

I am also a bit worried about the test: in the output we build the debug string 
and get the time in milliseconds for the LastTimeAtMinShare. What if the 
{{updateStarvationStats()}} call was run 1 millisecond earlier than the debug 
string was build? The comparison would fail and the test would fail because of 
that. I don't think we can guarantee that those two calls will be in the same 
millisecond.

> Fairscheduler: Dump scheduler state in log
> ------------------------------------------
>
>                 Key: YARN-6042
>                 URL: https://issues.apache.org/jira/browse/YARN-6042
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler
>            Reporter: Yufei Gu
>            Assignee: Yufei Gu
>         Attachments: YARN-6042.001.patch, YARN-6042.002.patch
>
>
> To improve the debugging of scheduler issues it would be a big improvement to 
> be able to dump the scheduler state into a log on request. 
> The Dump the scheduler state at a point in time would allow debugging of a 
> scheduler that is not hung (deadlocked) but also not assigning containers. 
> Currently we do not have a proper overview of what state the scheduler and 
> the queues are in and we have to make assumptions or guess
> The scheduler and queue state needed would include (not exhaustive):
> - instantaneous and steady fair share (app / queue)
> - AM share and resources
> - weight
> - app demand
> - application run state (runnable/non runnable)
> - last time at fair/min share



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to