The block size controls lots of things in Hadoop. It affects read parallelism, scalability, block allocation and other aspects of operations either directly or indirectly.
On Sun, May 12, 2013 at 10:38 AM, shashwat shriparv < dwivedishash...@gmail.com> wrote: > The block size is for allocation not storage on the disk. > > *Thanks & Regards * > > ∞ > Shashwat Shriparv > > > > On Fri, May 10, 2013 at 8:54 PM, Harsh J <ha...@cloudera.com> wrote: > >> Thanks. I failed to add: It should be okay to do if those cases are >> true and the cluster seems under-utilized right now. >> >> On Fri, May 10, 2013 at 8:29 PM, yypvsxf19870706 >> <yypvsxf19870...@gmail.com> wrote: >> > Hi harsh >> > >> > Yep. >> > >> > >> > >> > Regards >> > >> > >> > >> > >> > >> > >> > 发自我的 iPhone >> > >> > 在 2013-5-10,13:27,Harsh J <ha...@cloudera.com> 写道: >> > >> >> Are you looking to decrease it to get more parallel map tasks out of >> >> the small files? Are you currently CPU bound on processing these small >> >> files? >> >> >> >> On Thu, May 9, 2013 at 9:12 PM, YouPeng Yang < >> yypvsxf19870...@gmail.com> wrote: >> >>> hi ALL >> >>> >> >>> I am going to setup a new hadoop environment, .Because of >> there are >> >>> lots of small files, I would like to change the >> default.block.size to >> >>> 16MB >> >>> other than adopting the ways to merge the files into large enough >> (e.g >> >>> using sequencefiles). >> >>> I want to ask are there any bad influences or issues? >> >>> >> >>> Regards >> >> >> >> >> >> >> >> -- >> >> Harsh J >> >> >> >> -- >> Harsh J >> > >