[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Naganarasimha G R updated YARN-3946: ------------------------------------ Attachment: YARN-3946.v1.005.patch Thanks for the comments [~wangda], bq. When app goes to final state (FINISHED/KILLEd, etc.), should we simply set AMLaunchDiagnostics to null? IIUC you are referring to RMAppAttemptImpl right ?, if so its mistake while correcting based on your previous comment missed to revert this part but anyway as per your 4th comment in cases of unmanaged AM i have updated it to null here. bq. Why need two separate methods: updateDiagnosticsIfNotRunning/updateDiagnostics? May be the name needs to be proper but two methods are required as the status needs to be updated only if AM is not running for example its called in FiCaSchedulerApp.allocate, this method will be called whenever container is assiged for a app but we want to update the diagnostic only when the AM is not yet launched. and similarly used in LeafQueue.assignContainers. But in some cases we are sure that the AM is not yet launched hence to avoid unwanted verification (whether AM is running) we have updateDiagnostics. May be i can name them as {{checkAndUpdateAMContainerDiagnostics}} and {{updateAMContainerDiagnostics}} ? bq. Do you think is it better to rename AMState.PENDING to inactivated? Yes, PENDING is not understandable to all hence the diagnostic message for {{PENDING}} is already set as *"Application is added to the scheduler and is not yet activated."* may be i can mention it as {{Application is added to the scheduler but is not yet scheduled.}} Thoughts? bq. Instead of setting AMLaunchDiagnostics to null when RMAppAttempt enters Scheduled state,do you think is it better to do that in RUNNING and FINAL_SAVING state? Unmanaged AM could skip the SCHEDULED state. IMO i would prefer to set only for Unmanaged AMs in *FINAL_SAVING state* as already we are showing the *YarnApplicationState* as running and giving description abt it. so again if diagnostics is also showing that AM is launched and running then it can becomes repetitive in UI for normal (non unmanaged AM) apps. bq. It will be also very usaful if you can update AM launch diagnostics when RMAppAttempt go to LAUNCHED state, Actually i wrongly considered AMContainerAllocatedTransition to reset the diag message, my intention was to reset only after its launched and registered. This would be very usefull for analyzing the state of AM. Have introduced {{LAUNCHED}} and setting after AMLauncher sends LAUNCHED event to RMAppAttempt. [~wangda] & [~jianhe] Please review the latest patch, > Allow fetching exact reason as to why a submitted app is in ACCEPTED state in > CS > -------------------------------------------------------------------------------- > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager > Affects Versions: 2.6.0 > Reporter: Sumit Nigam > Assignee: Naganarasimha G R > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, > YARN-3946.v1.004.patch, YARN-3946.v1.005.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)