As someone who has been developing/running/using the software for a longer period of time than the person who is asking the question, you can best serve the poser by making them aware of the trade offs and why it's a good/bad idea to do things a certain way. At the end of the day, it's their choice to make based on their requirements and constraints.
Having said that, it'll be really nice to stop this thread from becoming more about how to answer questions rather than answering the question itself. Bringing the thread back to track: Jay, you can certainly run zookeepers with the Datanodes and Region Server processes. The issue there (as highlighted by Andy earlier) is that you will likely load up the machine (primarily due to I/O) which will cause ZK some grief. It is generally recommended to collocate in the following groups: Datanode + Region Servers on the same physical nodes Zookeeper and HBase Master on the same physical nodes (make sure to give ZK a dedicated spindle) Namenode on an independent node Secondary Namenode on an independent node These are the general recommendations and different environments might warrant different decisions. For instance, if it's just a PoC or Dev cluster where you don't really want to fret about SLAs and want to keep costs low, it might even be okay to collocate the Namenode, Zookeeper and HBase master on the same physical host. Hope that helps -Amandeep On Monday, July 2, 2012 at 4:40 AM, Michael Segel wrote: > I am not finding fault with what Andy was saying. The problem is that we tend > not to use stronger language when discussing these topics. And my point > wasn't just on this topic but others posts where we say 'not a good idea' yet > someone still pursues the idea until there's a chorus of saying not to do > something. I'm not faulting the poster because he wasn't and isn't the only > one who does this... We see it all the time where someone goes down the wrong > path, and is looking for a quick solution, rather than following the > recommendation.