[ https://issues.apache.org/jira/browse/YARN-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391597#comment-16391597 ]
Wei Yan commented on YARN-4488: ------------------------------- Thanks for pinging, [~leftnoteasy]. I created YARN-7844 previously, which mostly exposes related metrics in the scheduler level, including (may not fully included in YARN-7844.001.patch) various scheduler ops (node_add, node_remove, allocate, update...), and event queue size. This set of metrics would help us understand whether RM scheduler is under-pressure, what is the throughput of the scheduler, and whether the scheduler itself becomes a system bottleneck. For this JIRA, the scheduling delay for a container, an application can be various due to different reasons: scheduler itself, resource availability, queue configs... I'm not sure how we can use this info in prod, to tune queue configs. In our prod env, the top complaints from customers are their jobs get long time to run. Mostly becuase of their queues short of resources, which have already covered by existing metrics (tracking available resources for each queue). > CapacityScheduler: Compute per-container allocation latency and roll up to > get per-application and per-queue > ------------------------------------------------------------------------------------------------------------ > > Key: YARN-4488 > URL: https://issues.apache.org/jira/browse/YARN-4488 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Karthik Kambatla > Assignee: Manikandan R > Priority: Major > Attachments: YARN-4485.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org