[ 
https://issues.apache.org/jira/browse/YARN-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391597#comment-16391597
 ] 

Wei Yan commented on YARN-4488:
-------------------------------

Thanks for pinging, [~leftnoteasy]. I created YARN-7844 previously, which 
mostly exposes related metrics in the scheduler level, including (may not fully 
included in YARN-7844.001.patch) various scheduler ops (node_add, node_remove, 
allocate, update...), and event queue size. This set of metrics would help us 
understand whether RM scheduler is under-pressure, what is the throughput of 
the scheduler, and whether the scheduler itself becomes a system bottleneck.

For this JIRA, the scheduling delay for a container, an application can be 
various due to different reasons: scheduler itself, resource availability, 
queue configs... I'm not sure how we can use this info in prod, to tune queue 
configs. In our prod env, the top complaints from customers are their jobs get 
long time to run. Mostly becuase of their queues short of resources, which have 
already covered by existing metrics (tracking available resources for each 
queue).

> CapacityScheduler: Compute per-container allocation latency and roll up to 
> get per-application and per-queue
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4488
>                 URL: https://issues.apache.org/jira/browse/YARN-4488
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Karthik Kambatla
>            Assignee: Manikandan R
>            Priority: Major
>         Attachments: YARN-4485.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to