[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:
------------------------------------
    Attachment: YARN-3946.v1.005.patch

Thanks for the comments [~wangda],
bq. When app goes to final state (FINISHED/KILLEd, etc.), should we simply set 
AMLaunchDiagnostics to null?
IIUC you are referring to RMAppAttemptImpl right ?, if so its mistake while 
correcting based on your previous comment missed to revert this part but anyway 
as per your 4th comment in cases of unmanaged AM i have updated it to null here.

bq. Why need two separate methods: 
updateDiagnosticsIfNotRunning/updateDiagnostics?
May be the name needs to be proper but two methods are required as the status 
needs to be updated only if AM is not running for example its called in 
FiCaSchedulerApp.allocate, this method will be called whenever container is 
assiged for a app but we want to update the diagnostic only when the AM is not 
yet launched. and similarly used in LeafQueue.assignContainers. But in some 
cases we are sure that the AM is not yet launched hence to avoid unwanted 
verification (whether AM is running) we have updateDiagnostics. May be i can 
name them as {{checkAndUpdateAMContainerDiagnostics}} and 
{{updateAMContainerDiagnostics}} ?

bq. Do you think is it better to rename AMState.PENDING to inactivated?
Yes, PENDING is not understandable to all hence the diagnostic message for 
{{PENDING}} is already set as *"Application is added to the scheduler and is 
not yet activated."* may be i can mention it as {{Application is added to the 
scheduler but is not yet scheduled.}} Thoughts? 

bq. Instead of setting AMLaunchDiagnostics to null when RMAppAttempt enters 
Scheduled state,do you think is it better to do that in RUNNING and 
FINAL_SAVING state? Unmanaged AM could skip the SCHEDULED state.
IMO i would prefer to set only for Unmanaged AMs in *FINAL_SAVING state* as 
already we are showing the *YarnApplicationState* as running and giving 
description abt it. so again if diagnostics is also showing that AM is launched 
and running then it can becomes repetitive in UI for normal (non unmanaged AM) 
apps.

bq. It will be also very usaful if you can update AM launch diagnostics when 
RMAppAttempt go to LAUNCHED state, 
Actually i wrongly considered AMContainerAllocatedTransition to reset the diag 
message, my intention was to reset only after its launched and registered. This 
would be very usefull for analyzing the state of AM. Have introduced 
{{LAUNCHED}} and setting after AMLauncher sends  LAUNCHED event to RMAppAttempt.

[~wangda] & [~jianhe]
Please review the latest patch,

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> --------------------------------------------------------------------------------
>
>                 Key: YARN-3946
>                 URL: https://issues.apache.org/jira/browse/YARN-3946
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Sumit Nigam
>            Assignee: Naganarasimha G R
>         Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to