Re: DFS block size

2009-11-15 Thread Jeff Hammerbacher
> > Cloudera has a pretty detailed blog on this. > Indeed. See http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/. The post is getting a bit long in the tooth but should contain some useful information for you. Regards, Jeff

Re: DFS block size

2009-11-14 Thread Amogh Vasekar
Replies inline. On 11/14/09 9:55 PM, "Hrishikesh Agashe" wrote: Hi, Default DFS block size is 64 MB. Does this mean that if I put file less than 64 MB on HDFS, it will not be divided any further? --Yes, file will be stored in single block per replica. I have lots and lots if

DFS block size

2009-11-14 Thread Hrishikesh Agashe
Hi, Default DFS block size is 64 MB. Does this mean that if I put file less than 64 MB on HDFS, it will not be divided any further? I have lots and lots if XMLs and I would like to process them directly. Currently I am converting them to Sequence files (10 XMLs per sequence file) and the