In addition to what the others have said, I will repeat my standard advice (gleaned from listening to this list for the last year): If you have 10 nodes or fewer, then you want 1 Master node (namenode, jobtracker, hbase master, zookeeper node) 9 slave nodes(datanode, tasktracker, hbase region server)
If you have more than 10 nodes, then you want 3 or 5 zookeeper nodes. Zookeeper nodes can share hardware with other services as long as they have a dedicated disk (dedicated bandwidth wouldn't hurt, but is probably not necessary). And if you have a lot of nodes, then you want an odd number of zookeeper nodes with one node per rack at minimum. I don't believe that hbase master takes much resources, so having it share the master node is not a problem. I have a 6 node cluster that I share with Solr, and this setup works well. So well that I am having a hard time convincing anyone to get me more hardware. Dave -----Original Message----- From: Joseph Coleman [mailto:joe.cole...@infinitecampus.com] Sent: Tuesday, February 01, 2011 7:51 AM To: hbase-u...@hadoop.apache.org Subject: Hadoop setup question. Hi all not sure where to ask this question but here it goes. I have been playing with Hadoop for a while now in a test environment before we setup and deploy a productions environment. I am using Hadoop 0.20.0 on Ubuntu 10.04 LTS install on Dell 1950's currently. My question is what raid should I be using for my data nodes? I haven't come across anything that clearly spells it out I have used raid1 and then EXT4 filesystem but I know this isn't right after further research but not sure what do do. I will be setting up 3 masters in a cluster which I will raid out. And roughly 10 datanodes running hdfs and hbase and a separate zookeeper cluster. Any thoughts or recommendations on the clustering would be much appreciated. Thanks, Joe