Re: Managing multi-site clusters with Zookeeper

2010-03-08 Thread Martin Waite
Hi Ted, If the links do not work for us for zk, then they are unlikely to work with any other solution - such as trying to stretch Pacemaker or Red Hat Cluster with their multicast protocols across the links. If the links are not good enough, we might have to spend some more money to fix this.

Re: Managing multi-site clusters with Zookeeper

2010-03-08 Thread Patrick Hunt
IMO latency is the primary issue you will face, but also keep in mind reliability w/in a colo. Say you have 3 colos (obv can't be 2), if you only have 3 servers, one in each colo, you will be reliable but clients w/in each colo will have to connect to a remote colo if the local fails. You

Ok to share ZK nodes with Hadoop nodes?

2010-03-08 Thread David Rosenstrauch
I'm contemplating an upcoming zookeeper rollout and was wondering what the zookeeper brain trust here thought about a network deployment question: Is it generally considered bad practice to just deploy zookeeper on our existing hdfs/MR nodes? Or is it better to run zookeeper instances on

Re: Managing multi-site clusters with Zookeeper

2010-03-08 Thread Martin Waite
Hi Patrick, Thanks for you input. I am planning on having 3 zk servers per data centre, with perhaps only 2 in the tie-breaker site. The traffic between zk and the applications will be lots of local reads - who is the primary database ?. Changes to the config will be rare (server rebuilds, etc

Re: Ok to share ZK nodes with Hadoop nodes?

2010-03-08 Thread Patrick Hunt
See the troubleshooting page, some apropos detail there (esp relative to virtual env). http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting ZK servers are sensitive to IO (disk/network) latency. As long as you aren't very sensitive latency requirements it should be fine. If the machine

Re: Managing multi-site clusters with Zookeeper

2010-03-08 Thread Mahadev Konar
HI Martin, The results would be really nice information to have on ZooKeeper wiki. Would be very helpful for others considering the same kind of deployment. So, do send out your results on the list. Thanks mahadev On 3/8/10 11:18 AM, Martin Waite waite@googlemail.com wrote: Hi Patrick,

Re: Ok to share ZK nodes with Hadoop nodes?

2010-03-08 Thread David Rosenstrauch
On 03/08/2010 02:21 PM, Patrick Hunt wrote: See the troubleshooting page, some apropos detail there (esp relative to virtual env). http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting ZK servers are sensitive to IO (disk/network) latency. As long as you aren't very sensitive latency

Re: Managing multi-site clusters with Zookeeper

2010-03-08 Thread Patrick Hunt
That's controlled by the tickTime/synclimit/initlimit/etc.. see more about this in the admin guide: http://bit.ly/c726DC You'll want to increase from the defaults since those are typically for high performance interconnect (ie within colo). You are correct though, much will depend on your

Re: Ok to share ZK nodes with Hadoop nodes?

2010-03-08 Thread Ted Dunning
I have used 5 and 3 in different clusters. Moderate amounts of sharing is reasonable, but sharing with less intensive applications is definitely better. Sharing with the job tracker, for instance is likely fine since it doesn't abuse disk so much. The namenode is similar, but not quite as nice.