[jira] [Updated] (YARN-6788) Improve performance of resource profile branch

Sunil G (JIRA) Thu, 27 Jul 2017 04:59:27 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sunil G updated YARN-6788:
--------------------------
    Attachment: YARN-6788-YARN-3926.014.patch

Thanks [~templedf]

Updating new patch based on your comments. There are some comments which i have 
some doubts and clarification provided. Kindly check the same.

bq.Is there a reason to have "known" (or "readOnly") in the variable names? 
Doesn't seem to add any useful information.
Returning a map and then iterating on its values or keyset by using iterator is 
performance bottleneck. As per my profiling analysis, it contributed around 15% 
of performance as these methods are used as hotspot (Resources and 
ResourceCalculator api). Hence we keep a readonly DS to return as value for all 
these callers and work on same to loop or access on single entry based on index.

bq.ResourceUtils.resourceNameToIndex should be RESOURCE_NAME_TO_INDEX
Jus a doubt here. Why we need this to be in CAPS, is its because its final and 
static? 

bq.In ResourceUtils.initializeResourcesMap(), it seems to me that {{ 
knownResourceTypes = Collections.unmodifiableMap(resourceInformationMap);}} 
should be moved into ResourceUtils.updateReadOnlyResources()
I think we will add one more argument to 
ResourceUtils.updateReadOnlyResources/knownResources to take map and then copy 
it. But its better to do as early as possible to avoid an extra argument as 
possible.

bq.Why doesn't ResourceUtils.getResourceNamesArray() call getResourceTypes()?
As mentioned in first comment, cost of looping on iterator of map.values() is 
costlierlieer. Hence its better to operate on simple array.

bq.The for loop to update resource names in ResourceUtils.getResourceTypes() is 
a lovely opportunity for a lambda. 
Yes. If possible, i would like take it up separately as its not affecting 
functionality.

bq.n the for loop to update resource names in ResourceUtils.getResourceTypes() 
memory and CPU aren't handled a priori, and there's no guarantee that the set 
iteration will match the map values iteration in updateReadOnlyResources()
knownResourceNamesArray, resourceNameToIndex are in sync and in order. This is 
because in updateKnownResources, we update Mem and CPU as first two entries and 
other resource in following indices. Then resourceNameToIndex is derived from 
knownResourceNamesArray itself. All our getter in Resources and 
DominantCalcultor uses these DS. getResourceTypes is not used for normal 
computation. It just used for initialization.

With this assumption we can assume resource will be in order. Please correct me 
if you feel some issues.

bq.In the FixedValueResource() constructor, the resources array is not created 
according to the resource type index, so resource lookups will fail.
In {{FixedValueResource.initResourceMap}} we use 
{{ResourceUtils.getResourceTypesArray}} to get the index of resources. then we 
created all resources in order. 

bq.In Resources.addTo() the variable in the foreach should be named lhsValue 
instead of declaring a new variable for it.
I have a loop optimization patch ready in this area. And i ll upload in another 
jira. I would like that patch separate and on top of this since its loop 
optimization for all apis in Resources and DominantResourceCalculator class. 
Hence I will take this comment in that one if its fine.

bq.There's now no class that extends Resource other than BaseResource. Maybe 
it's time to merge BaseResource into Resource?
I think its too early for that. Resource is basic public class and BaseResource 
is internal class to operate on resource compared to ResourcePBImpl. I think we 
can revise this once its merged and used well.

bq.In ResourcePBImpl.initResources(), the copy to readOnlyResources should 
probably be moved into initResourcesMap()
Actually resources is created in initResourcesMap, but its populated in main 
array. So I placed it there.

Regarding lock object in ResourceUtils, we use a double lock model to handle 
data corruption during init. I guess this may be a cleaner approach ,how do you 
feel?
I will wait for jenkins to see if any test error, if so i ll update another 
patch

> Improve performance of resource profile branch
> ----------------------------------------------
>
>                 Key: YARN-6788
>                 URL: https://issues.apache.org/jira/browse/YARN-6788
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Sunil G
>            Assignee: Sunil G
>            Priority: Blocker
>         Attachments: YARN-6788-YARN-3926.001.patch, 
> YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch, 
> YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch, 
> YARN-6788-YARN-3926.006.patch, YARN-6788-YARN-3926.007.patch, 
> YARN-6788-YARN-3926.008.patch, YARN-6788-YARN-3926.009.patch, 
> YARN-6788-YARN-3926.010.patch, YARN-6788-YARN-3926.011.patch, 
> YARN-6788-YARN-3926.012.patch, YARN-6788-YARN-3926.013.patch, 
> YARN-6788-YARN-3926.014.patch
>
>
> Currently we could see a 15% performance delta with this branch. 
> Few performance improvements to improve the same.
> Also this patch will handle 
> [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418]
>  from [~leftnoteasy].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6788) Improve performance of resource profile branch

Reply via email to