Hi David,
  Sharing the cluster with HDFS and Map reduce might cause significant
problems. Mapreduce is very IO intensive and this might cause lot of
unnecessary hiccups in your cluster. I would suggest atleast providing
something like this, if you really want to share the nodes.

- atleast considerable amount of memory space say 400-500MB (depending on
your usage) for the java heap
- one dedicated disk not used by MR or Datanodes, so that ZooKeeper
performance is a little predictable for you.

Thanks
mahadev


On 3/8/10 10:58 AM, "David Rosenstrauch" <dar...@darose.net> wrote:

> I'm contemplating an upcoming zookeeper rollout and was wondering what
> the zookeeper brain trust here thought about a network deployment question:
> 
> Is it generally considered bad practice to just deploy zookeeper on our
> existing hdfs/MR nodes?  Or is it better to run zookeeper instances on
> their own dedicated nodes?
> 
> On the one hand, we're not going to be making heavy-duty use of
> zookeeper, so it might be sufficient for zookeeper nodes to share box
> resources with HDFS & MR.  On the other hand, though, I don't want
> zookeeper to become unavailable if the nodes are running a resource
> intensive job that's hogging CPU or network.
> 
> 
> What's generally considered best practice for Zookeeper?
> 
> Thanks,
> 
> DR

Reply via email to