This is a dated blog post, so it would help if someone with current HDFS knowledge can validate it: http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/ .
There is a bit about the RAM required for the Namenode and how to compute it: You can look at the 'Namespace limitations' section. Thanks hemanth On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <donta...@gmail.com> wrote: > Hello Chris, > > Thank you so much for the valuable insights. I was actually using the > same principle. I did the blunder and did the maths for entire (9*3)PB. > > Seems I am higher than you, that too without drinking ;) > > Many thanks. > > > Regards, > Mohammad Tariq > > > > On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <cemb...@gmail.com> wrote: > >> Hi Mohammed, >> >> The amount of RAM on the NN is related to the number of blocks... so >> let's do some math. :) 1G of RAM to 1M blocks seems to be the general rule. >> >> I'll probably mess this up so someone check my math: >> >> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data. Let's put that in 128MB blocks: >> according to kcalc that's 75,497,472 of 128 MB Blocks. >> Unless I missed this by an order of magnitude (entirely possible... I've >> been drinking since 6), that sound like 76G of RAM (above OS requirements). >> 128G should kick it's ass; 256G seems like a waste of $$. >> >> Hmm... That makes the NN sound extremely efficient. Someone validate me >> or kick me to the curb. >> >> YMMV ;) >> >> >> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <donta...@gmail.com>wrote: >> >>> Hello Michael, >>> >>> It's an array. The actual size of the data could be somewhere >>> around 9PB(exclusive of replication) and we want to keep the no of DNs as >>> less as possible. Computations are not too frequent, as I have specified >>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if >>> the block size is 128MB, the no of blocks would be 201326592. So, I was >>> thinking of having 256GB RAM for the NN. Does this make sense to you? >>> >>> Many thanks. >>> >>> Regards, >>> Mohammad Tariq >>> >>> >>> >>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel < >>> michael_se...@hotmail.com> wrote: >>> >>>> 500 TB? >>>> >>>> How many nodes in the cluster? Is this attached storage or is it in an >>>> array? >>>> >>>> I mean if you have 4 nodes for a total of 2PB, what happens when you >>>> lose 1 node? >>>> >>>> >>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <donta...@gmail.com> wrote: >>>> >>>> Hello list, >>>> >>>> I don't know if this question makes any sense, but I would >>>> like to ask, does it make sense to store 500TB (or more) data in a single >>>> DN?If yes, then what should be the spec of other parameters *viz*. NN >>>> & DN RAM, N/W etc?If no, what could be the alternative? >>>> >>>> Many thanks. >>>> >>>> Regards, >>>> Mohammad Tariq >>>> >>>> >>>> >>>> >>> >> >