[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16644207#comment-16644207
 ] 

Szilard Nemeth commented on YARN-8842:
--------------------------------------

According to our offline discussion with [~haibochen], here are the changes 
incorporated with the latest patch (patch006).
As I applied all the changes listed here as separate steps and ran the tests 
with every change, an unwanted code change is quite unlikely as the test were 
always passed.

1. Unified resource decrease methods into one single method, so that: 
withContainersToDecrease, withVCoresToDecrease, withMemoryMBToDecrease, 
withCustomResToDecrease became withResourceToDecrease. This way, testcases are 
more concise and explicit.
2. Same as above, but for the declaration of resources and not decrease of 
resources.
2. Changed the QueueHierarchy class: instead of representing the hierarchy as a 
list of queue names, I created a Queue class that holds its name and a 
reference to its child queue. The validation code is also changed, an on-demand 
validation of queue names does happen when adding a child queue to the 
hierarchy.
3. The common parts of initializing QueueMetricsTestcase was extracted to a 
method.
4. Fixed test methods that only tested leaf queue metrics: now these methods 
get all the queue metrics and check the values on all queue levels. Example: 
testUpdatePreemptedSeconds previously only checked mqs.getLeafQueueSource, now 
it gets all the queue metrics sources and checks them.
5. Created separate classes for every type of custom resource: preempted 
seconds, allocated, available, pending, reserved.
Kept QueueMetricsForCustomResources class as a wrapper of all types, so that 
QueueMetrics should only refer to this class and it will delegate all 
alterations to the appropriate resource metric type objects.
Having an abstract class as a parent for the metric type classes named 
QueueMetricsCustomResourcesAbstract, I could keep the computation methods in 
one single place.

6. Added the performance optimization as [~wilfreds] suggested: Custom resource 
metric update only happen if there is any custom resource defined.

Please note that: 
1. The testcases in QueueMetricsForCustomResources would be more directly 
readable if the testXXX methods in QueueMetricsTestcase were moved to there. 
The reason why I don't do that is that QueueMetricsTestcase is intended to be a 
self-contained class that holds the data and the logic as well for all types of 
metrics testcases. Ideally, QueueMetricsTestcase could be used even from 
TestQueueMetrics, but I did not want to squeeze way too many changes into this 
patch.

2. I think it's not a right decision to move all metrics for cores / memory to 
QueueMetricsForCustomResources, as the former are annotated Metrics, the latter 
are not real metrics, but values stored in a map for every resource. 
All in all, I like to idea to have all resource metrics in one class per type, 
but I would do that in a follow-up jira. One more thing to keep in mind is that 
AFAIK, the metric subsytem from Hadoop-common is not ready for using metrics 
from dynamic fields (e.g. Maps) so this will be quite tricky to implement.

> Update QueueMetrics with custom resource values 
> ------------------------------------------------
>
>                 Key: YARN-8842
>                 URL: https://issues.apache.org/jira/browse/YARN-8842
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-8842.001.patch, YARN-8842.002.patch, 
> YARN-8842.003.patch, YARN-8842.004.patch, YARN-8842.005.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to