Re: HDFS without Hadoop: Why?

2011-02-02 Thread Stuart Smith
> Stuart - if Dhruba is giving hdfs file and block sizes used by the namenode, you really cannot get a more authoritative number elsewhere :) Yes - very true! :) I spaced out on the name there ... ;) One more thing - I believe that if you're storing a lot of your smaller files in hbase, you

Re: HDFS without Hadoop: Why?

2011-02-02 Thread Konstantin Shvachko
Thanks for the link Stu. More details are on limitations are here: http://www.usenix.org/publications/login/2010-04/openpdfs/shvachko.pdf I think that Nathan raised an interesting question and his assessment of HDFS use cases are generally right. Some assumptions though are outdated at this point.

Re: HDFS without Hadoop: Why?

2011-02-02 Thread Gaurav Sharma
Stuart - if Dhruba is giving hdfs file and block sizes used by the namenode, you really cannot get a more authoritative number elsewhere :) I would do the back-of-envelope with ~160 bytes/file and ~150 bytes/block. On Wed, Feb 2, 2011 at 9:08 PM, Stuart Smith wrote: > > This is the best coverage

Re: HDFS without Hadoop: Why?

2011-02-02 Thread Stuart Smith
This is the best coverage I've seen from a source that would know: http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/ One relevant quote: To store 100 million files (referencing 200 million blocks), a name-node should have at least 60 GB of RAM. But, honestl

Re: HDFS without Hadoop: Why?

2011-02-02 Thread Dhruba Borthakur
The Namenode uses around 160 bytes/file and 150 bytes/block in HDFS. This is a very rough calculation. dhruba On Wed, Feb 2, 2011 at 5:11 PM, Dhodapkar, Chinmay wrote: > What you describe is pretty much my use case as well. Since I don’t know > how big the number of files could get , I am tryin

RE: HDFS without Hadoop: Why?

2011-02-02 Thread Dhodapkar, Chinmay
What you describe is pretty much my use case as well. Since I don’t know how big the number of files could get , I am trying to figure out if there is a theoretical design limitation in hdfs….. From what I have read, the name node will store all metadata of all files in the RAM. Assuming (in my

RE: HDFS without Hadoop: Why?

2011-02-02 Thread Stuart Smith
Hello,    I'm actually using hbase/hadoop/hdfs for lots of small files (with a long tail of larger files). Well, millions of small files - I don't know what you mean by lots :) Facebook probably knows better, But what I do is:   - store metadata in hbase   - files smaller than 10 MB or so in h

Re: HDFS without Hadoop: Why?

2011-02-02 Thread Ian Holsman
Haystack is described here http://www.facebook.com/note.php?note_id=76191543919 Regards Ian --- Ian Holsman AOL Inc ian.hols...@teamaol.com (703) 879-3128 / AIM:ianholsman it's just a technicality On Feb 2, 2011, at 7:28 PM, "Dhodapkar, Chinmay" wrote: > Hello, > > > > I have been foll

RE: HDFS without Hadoop: Why?

2011-02-02 Thread Dhodapkar, Chinmay
Hello, I have been following this thread for some time now. I am very comfortable with the advantages of hdfs, but still have lingering questions about the usage of hdfs for general purpose storage (no mapreduce/hbase etc). Can somebody shed light on what the limitations are on the number of fi

Re: HDFS without Hadoop: Why?

2011-02-02 Thread Jeff Hammerbacher
> > >- Large block size wastes space for small file. The minimum file size >is 1 block. > > That's incorrect. If a file is smaller than the block size, it will only consume as much space as there is data in the file. > >- There are no hardlinks, softlinks, or quotas. > > That's incorr