[ https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sunil G updated YARN-6788: -------------------------- Attachment: YARN-6788-YARN-3926.014.patch Thanks [~templedf] Updating new patch based on your comments. There are some comments which i have some doubts and clarification provided. Kindly check the same. bq.Is there a reason to have "known" (or "readOnly") in the variable names? Doesn't seem to add any useful information. Returning a map and then iterating on its values or keyset by using iterator is performance bottleneck. As per my profiling analysis, it contributed around 15% of performance as these methods are used as hotspot (Resources and ResourceCalculator api). Hence we keep a readonly DS to return as value for all these callers and work on same to loop or access on single entry based on index. bq.ResourceUtils.resourceNameToIndex should be RESOURCE_NAME_TO_INDEX Jus a doubt here. Why we need this to be in CAPS, is its because its final and static? bq.In ResourceUtils.initializeResourcesMap(), it seems to me that {{ knownResourceTypes = Collections.unmodifiableMap(resourceInformationMap);}} should be moved into ResourceUtils.updateReadOnlyResources() I think we will add one more argument to ResourceUtils.updateReadOnlyResources/knownResources to take map and then copy it. But its better to do as early as possible to avoid an extra argument as possible. bq.Why doesn't ResourceUtils.getResourceNamesArray() call getResourceTypes()? As mentioned in first comment, cost of looping on iterator of map.values() is costlierlieer. Hence its better to operate on simple array. bq.The for loop to update resource names in ResourceUtils.getResourceTypes() is a lovely opportunity for a lambda. Yes. If possible, i would like take it up separately as its not affecting functionality. bq.n the for loop to update resource names in ResourceUtils.getResourceTypes() memory and CPU aren't handled a priori, and there's no guarantee that the set iteration will match the map values iteration in updateReadOnlyResources() knownResourceNamesArray, resourceNameToIndex are in sync and in order. This is because in updateKnownResources, we update Mem and CPU as first two entries and other resource in following indices. Then resourceNameToIndex is derived from knownResourceNamesArray itself. All our getter in Resources and DominantCalcultor uses these DS. getResourceTypes is not used for normal computation. It just used for initialization. With this assumption we can assume resource will be in order. Please correct me if you feel some issues. bq.In the FixedValueResource() constructor, the resources array is not created according to the resource type index, so resource lookups will fail. In {{FixedValueResource.initResourceMap}} we use {{ResourceUtils.getResourceTypesArray}} to get the index of resources. then we created all resources in order. bq.In Resources.addTo() the variable in the foreach should be named lhsValue instead of declaring a new variable for it. I have a loop optimization patch ready in this area. And i ll upload in another jira. I would like that patch separate and on top of this since its loop optimization for all apis in Resources and DominantResourceCalculator class. Hence I will take this comment in that one if its fine. bq.There's now no class that extends Resource other than BaseResource. Maybe it's time to merge BaseResource into Resource? I think its too early for that. Resource is basic public class and BaseResource is internal class to operate on resource compared to ResourcePBImpl. I think we can revise this once its merged and used well. bq.In ResourcePBImpl.initResources(), the copy to readOnlyResources should probably be moved into initResourcesMap() Actually resources is created in initResourcesMap, but its populated in main array. So I placed it there. Regarding lock object in ResourceUtils, we use a double lock model to handle data corruption during init. I guess this may be a cleaner approach ,how do you feel? I will wait for jenkins to see if any test error, if so i ll update another patch > Improve performance of resource profile branch > ---------------------------------------------- > > Key: YARN-6788 > URL: https://issues.apache.org/jira/browse/YARN-6788 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager > Reporter: Sunil G > Assignee: Sunil G > Priority: Blocker > Attachments: YARN-6788-YARN-3926.001.patch, > YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch, > YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch, > YARN-6788-YARN-3926.006.patch, YARN-6788-YARN-3926.007.patch, > YARN-6788-YARN-3926.008.patch, YARN-6788-YARN-3926.009.patch, > YARN-6788-YARN-3926.010.patch, YARN-6788-YARN-3926.011.patch, > YARN-6788-YARN-3926.012.patch, YARN-6788-YARN-3926.013.patch, > YARN-6788-YARN-3926.014.patch > > > Currently we could see a 15% performance delta with this branch. > Few performance improvements to improve the same. > Also this patch will handle > [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418] > from [~leftnoteasy]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org