Re: how to improve the Hadoop's capability of dealing with small files

2009-05-12 Thread Rasit OZDAS
I have the similar situation, I have very small files, I never tried HBase (want to), but you can also group them and write (let's say) 20-30 into a file as every file becomes a key in that big file. There are methods in API which you can write an object as a file into HDFS, and read again to get

Re: how to improve the Hadoop's capability of dealing with small files

2009-05-07 Thread jason hadoop
The way I typically address that is to write a zip file using the zip utilities. Commonly for output. HDFS is not optimized for low latency, but for high through put for bulk operations. 2009/5/7 Edward Capriolo > 2009/5/7 Jeff Hammerbacher : > > Hey, > > > > You can read more about why small fi

Re: how to improve the Hadoop's capability of dealing with small files

2009-05-07 Thread Edward Capriolo
2009/5/7 Jeff Hammerbacher : > Hey, > > You can read more about why small files are difficult for HDFS at > http://www.cloudera.com/blog/2009/02/02/the-small-files-problem. > > Regards, > Jeff > > 2009/5/7 Piotr Praczyk > >> If You want to use many small files, they are probably having the same >>

Re: how to improve the Hadoop's capability of dealing with small files

2009-05-07 Thread Jeff Hammerbacher
Hey, You can read more about why small files are difficult for HDFS at http://www.cloudera.com/blog/2009/02/02/the-small-files-problem. Regards, Jeff 2009/5/7 Piotr Praczyk > If You want to use many small files, they are probably having the same > purpose and struc? > Why not use HBase instead

Re: how to improve the Hadoop's capability of dealing with small files

2009-05-07 Thread Piotr Praczyk
If You want to use many small files, they are probably having the same purpose and struc? Why not use HBase instead of a raw HDFS ? Many small files would be packed together and the problem would disappear. cheers Piotr 2009/5/7 Jonathan Cao > There are at least two design choices in Hadoop tha

Re: how to improve the Hadoop's capability of dealing with small files

2009-05-06 Thread Jonathan Cao
There are at least two design choices in Hadoop that have implications for your scenario. 1. All the HDFS meta data is stored in name node memory -- the memory size is one limitation on how many "small" files you can have 2. The efficiency of map/reduce paradigm dictates that each mapper/reducer j

Re: how to improve the Hadoop's capability of dealing with small files

2009-05-06 Thread imcaptor
Please try -D dfs.block.size=4096000 The specification must be in bytes. On Tue, May 5, 2009 at 4:47 AM, Christian Ulrik Søttrup wrote: - 隐藏引用文字 - > Hi all, > > I have a job that creates very big local files so i need to split it to as > many mappers as possible. Now the DFS block size I'm > us