sodonnel opened a new pull request #2849: URL: https://github.com/apache/hadoop/pull/2849
This is a relatively simple change to reduce the memory used by the Directory Scanner and also simplify the logic in the ScanInfo object. This change ensures the same File object is re-used for all blocks in a directory. Previously a large part of the path was repeated for each block file. Aside from that, the logic of the directory scanner remains the same. Comparing heap dumps, the memory used by 100K blocks goes from ~35MB to 19MB. Or 350MB per 1M blocks down to 190MB per 1M blocks. This is a reduction of about 46%. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org