In addition to what the others have said, I will repeat my standard advice 
(gleaned from listening to this list for the last year):
If you have 10 nodes or fewer, then you want
1 Master node (namenode, jobtracker, hbase master, zookeeper node)
9 slave nodes(datanode, tasktracker, hbase region server)

If you have more than 10 nodes, then you want 3 or 5 zookeeper nodes. Zookeeper 
nodes can share hardware with other services as long as they have a dedicated 
disk (dedicated bandwidth wouldn't hurt, but is probably not necessary). And if 
you have a lot of nodes, then you want an odd number of zookeeper nodes with 
one node per rack at minimum.

I don't believe that hbase master takes much resources, so having it share the 
master node is not a problem.  

I have a 6 node cluster that I share with Solr, and this setup works well. So 
well that I am having a hard time convincing anyone to get me more hardware.

Dave


-----Original Message-----
From: Joseph Coleman [mailto:joe.cole...@infinitecampus.com] 
Sent: Tuesday, February 01, 2011 7:51 AM
To: hbase-u...@hadoop.apache.org
Subject: Hadoop setup question.

Hi all not sure where to ask this question but here it goes. I have been 
playing with Hadoop for a while now in a test environment before we setup and 
deploy a productions environment. I am using Hadoop 0.20.0  on Ubuntu 10.04 LTS 
install on Dell 1950's currently.

My question is what raid should I be using for my data nodes? I haven't come 
across anything that clearly spells it out I have used raid1 and then EXT4 
filesystem but I know this isn't right after further research but not sure what 
do do. I will be setting up 3 masters in a cluster which I will raid out. And 
roughly 10 datanodes running hdfs and hbase and a separate zookeeper cluster. 
Any thoughts or recommendations on the clustering would be much appreciated.

Thanks,
Joe


Reply via email to