[ 
https://issues.apache.org/jira/browse/YARN-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-9035:
------------------------------------

    Assignee:     (was: Szilard Nemeth)

> Allow better troubleshooting of FS container assignments and lack of 
> container assignments
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-9035
>                 URL: https://issues.apache.org/jira/browse/YARN-9035
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-9035.001.patch
>
>
> The call chain started from {{FairScheduler.attemptScheduling}}, to 
> {{FSQueue}} (parent / leaf).assignContainer and down to 
> {{FSAppAttempt#assignContainer}} has many calls and has many potential 
> conditions where {{Resources.none()}} can be returned, meaning container is 
> not allocated.
>  A bunch of these empty-assignments do not come with a debug log statement, 
> so it's very hard to tell what condition lead the {{FairScheduler}} to a 
> decision where containers are not allocated.
>  On top of that, in many places, it's difficult to tell either why a 
> container was allocated to an app attempt.
> The goal is to have a common place (i.e. class) that will do all the 
> loggings, so users conveniently can control all the logs if they are curious 
> why (and why not) container assigments happened.
>  Also, it would be handy if readers of the log could easily decide which 
> {{AppAttempt}} is the log record created for, in other words: every log 
> record should include the ID of the application / app attempt, if possible.
>  
> Details of implementation: 
>  As most of the already in-place debug messages were protected by a condition 
> that checks whether the debug level is enabled on loggers, I followed a 
> similar pattern. All the relevant log messages are created with the class 
> {{ResourceAssignment}}. 
>  This class is a wrapper for the assigned {{Resource}} object and has a 
> single logger, so clients should use its helper methods to create log 
> records. There is a helper method called {{shouldLogReservationActivity}} 
> that checks if DEBUG or TRACE level is activated on the logger. 
>  See the javadoc on this class for further information.
>  
> {{ResourceAssignment}} is also responsible for adding the app / appettempt ID 
> to every log record (with some exceptions).
>  A couple of check classes are introduced: They are responsible to run and 
> store results of checks that are dependency of a successful container 
> allocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to