[ 
https://issues.apache.org/jira/browse/HDFS-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116892#comment-13116892
 ] 

Todd Lipcon commented on HDFS-2384:
-----------------------------------

A couple optimizations are possible:
1) First, scan just the directory heirarchy, gathering the paths of the block 
files and their inode numbers. Then, sort all block files by their inode 
numbers. Then, stat the inodes in ascending order. In tests with a cold cache 
on 160K blocks, this method of block report generation is approximately twice 
as fast as compared to "find /data/1/todd/hdfs/current/ -name blk_\* -a -not 
-name \*.meta -a -size +1". Unfortunately this involves calling out to a C 
program.
2) In 0.20 we currently use String.split to parse the filenames. This is rather 
CPU-inefficient.
                
> Improve speed of block report generation
> ----------------------------------------
>
>                 Key: HDFS-2384
>                 URL: https://issues.apache.org/jira/browse/HDFS-2384
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>    Affects Versions: 0.20.206.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: breport.c
>
>
> This JIRA is to track a couple of potential improvements to the speed of 
> block report generation while scanning the disks. In 0.20, the disks are 
> scanned for every block report, though in trunk the block reports are 
> generally built from memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to