>
> Cloudera has a pretty detailed blog on this.
>
Indeed. See http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/.
The post is getting a bit long in the tooth but should contain some useful
information for you.
Regards,
Jeff
Replies inline.
On 11/14/09 9:55 PM, "Hrishikesh Agashe"
wrote:
Hi,
Default DFS block size is 64 MB. Does this mean that if I put file less than 64
MB on HDFS, it will not be divided any further?
--Yes, file will be stored in single block per replica.
I have lots and lots if
Hi,
Default DFS block size is 64 MB. Does this mean that if I put file less than 64
MB on HDFS, it will not be divided any further?
I have lots and lots if XMLs and I would like to process them directly.
Currently I am converting them to Sequence files (10 XMLs per sequence file)
and the