[ 
https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944053#comment-16944053
 ] 

Aravindan Vijayan edited comment on HDDS-2241 at 10/3/19 10:00 PM:
-------------------------------------------------------------------

[~aengineer] This was not based on data we have seen in read performance tests. 
This is just an observation based on looking through the code while working on 
HDDS-2188 along with [~msingh].

Even if this does not show up as a bottleneck at this point of time, IMHO the 
2nd option looks like an obvious performance improvement that we should do. 


was (Author: avijayan):
[~aengineer] This was not based on data we have seen in read performance tests. 
This is just an observation based on looking through the code while working on 
HDDS-2188 along with [~msingh].  

> Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the 
> pipelines for a key
> --------------------------------------------------------------------------------------------
>
>                 Key: HDDS-2241
>                 URL: https://issues.apache.org/jira/browse/HDDS-2241
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Manager
>            Reporter: Aravindan Vijayan
>            Assignee: Aravindan Vijayan
>            Priority: Major
>
> Currently, while looking up a key, the Ozone Manager gets the pipeline 
> information from SCM through an RPC for every block in the key. For large 
> files > 1GB, we may end up making a lot of RPC calls for this. This can be 
> optimized in a couple of ways
> * We can implement a batch getContainerWithPipeline API in SCM using which we 
> can get the pipeline info locations for all the blocks for a file. To keep 
> the number of containers passed in to SCM in a single call, we can have a 
> fixed container batch size on the OM side. _Here, Number of calls = 1 (or k 
> depending on batch size)_
> * Instead, a simpler change would be to have a map (method local) of 
> ContainerID -> Pipeline that we get from SCM so that we don't need to make 
> repeated calls to SCM for the same containerID for a key. _Here, Number of 
> calls = Number of unique containerIDs_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to