RE: Storing millions of small files

2012-05-23 Thread Brendan cheng
:31 -0700 Subject: Re: Storing millions of small files From: mcsri...@gmail.com To: hdfs-user@hadoop.apache.org Brendan, since you are looking for a distr file system that can store multi millions of files, try out MapR. A few customers have actually crossed over 1 trillion files

FW: Storing millions of small files

2012-05-23 Thread Jayaseelan E
-Original Message- From: Keith Wiley [mailto:kwi...@keithwiley.com] Sent: Tuesday, May 22, 2012 9:57 PM To: hdfs-user@hadoop.apache.org Subject: Re: Storing millions of small files In addition to the responses already provided, there is another downside to using hadoop with numerous

Re: Storing millions of small files

2012-05-23 Thread Ted Dunning
? Date: Tue, 22 May 2012 21:56:31 -0700 Subject: Re: Storing millions of small files From: mcsri...@gmail.com To: hdfs-user@hadoop.apache.org Brendan, since you are looking for a distr file system that can store multi millions of files, try out MapR. A few customers have

Re: Storing millions of small files

2012-05-22 Thread Wasif Riaz Malik
Hi, Hi Brendan, The number of files that can be stored in HDFS is limited by the size of the NameNode's RAM. The downside with storing small files is that you would saturate the NameNode's RAM with a small data set (sum of the size of all your small files). However, you can store around 100

Re: Storing millions of small files

2012-05-22 Thread Mohammad Tariq
Hi Brendan, Every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies 150 bytes.When we store many small files in the HDFS, these small files occupy a large portion of the namespace(large overhead on namenode). As a consequence, the

Re: Storing millions of small files

2012-05-22 Thread Harsh J
Brendan, The issue with using lots of small files is that your processing overhead increases (repeated, avoidable file open-read(little)-close calls). HDFS is also used by those who wish to also heavily process the data they've stored and with a huge number of files such a process is not gonna be

Re: Storing millions of small files

2012-05-22 Thread Keith Wiley
In addition to the responses already provided, there is another downside to using hadoop with numerous files: it takes much longer to run a hadoop job! Starting a hadoop job consists of communicating between the driver (which runs on a client machine outside the cluster) and the namenode to

Re: Storing millions of small files

2012-05-22 Thread M. C. Srivas
Brendan, since you are looking for a distr file system that can store multi millions of files, try out MapR. A few customers have actually crossed over 1 trillion files without hitting problems. Small files or large files are handled equally well. Of course, if you are doing map-reduce, it is