Re: fs.local.block.size vs file.blocksize

Ellis H. Wilson III Sun, 12 Aug 2012 10:43:19 -0700

Many thanks to Eli and Harsh for their responses!  Comments in-line:


On 08/12/2012 09:48 AM, Harsh J wrote:

Hi Ellis,

Note that when in Hadoop-land, a "block size" term generally means the
chunking size of HDFS writers and readers, and that is not the same as
the FS term "block size" in any way.

Yes, I do know that, but I was confused about something else. More onthat later in #2.

On Thu, Aug 9, 2012 at 6:40 PM, Ellis H. Wilson III<[email protected]>  wrote:

Can someone please briefly explain the difference?  I do not see deprecated
warnings for fs.local.block.size when I run with them set and I see two
copies of RawLocalFileSystem.java (the other is local/RawLocalFs.java).


The right param still seems to be "fs.local.block.size", when it comes
to using "getDefaultBlocksize" calls via the file:/// filesystems or
other filesystems that have not over-riden the default behavior.

This question was more out of curiosity than anything. My experimentsagree that "fs.local.blocksize" is the right parameter for controllingthe blocksize of file:///, but I'm still quite perplexed as to wherefile.blocksize actually is used. I chased it around for a while inEclipse last night, but have yet to see where it is directly resourced(keyconfigs sets it and suggests FileSystem, RawLocalFileSystem andCheckSumFileSystem all use it, but I don't see it being used in anypractical way).

The things I really need to get answers to are:
1. Is the default boosted to 64MB from Hadoop 1.0 to Hadoop 2.0?  I believe
it is, but want validation on that.


The dfs.blocksize, which applies to HDFS, has not changed from its 64
MB default.

I was referring to RawLocalFileSystem, not DistributedFileSystem. I amfairly certain from my tests and from the code I've dug through that thedefault blocksize is still 32MB at the moment. Please note that myquestions here are fairly unconcerned with HDFS, as I'm not using it atall in >75% of my tests.

2. Which one controls shuffle block-size?


There is no "shuffle block-size", as shuffle goes to local filesystems
and that has no block size concepts. Can you elaborate on this?

This was a plain ol' misconception/mistake on my part, still stickingaround from when I started working in the Hadoop source just over a yearback. I mistook performance increases in TeraGen but performancedecreases in TeraSort (noted by an elongated shuffle phase) when Iincreased file:///'s blocksize to suggest that the shuffling used thefile:/// filesystem as well. I now understand why this can happen, andappreciate you clarifying as my digging through the shuffle code hasdone that indeed, no chunking occurs on shuffle. My apologies for theconfusing question, based on errant inferences.

Thanks again to both of you! However, if anyone has better intuition onwhat the file.blocksize parameter does, I'd be happy to hear it.


Best,

ellis

Re: fs.local.block.size vs file.blocksize

Reply via email to