Hi Patrick, Thanks for you input.
I am planning on having 3 zk servers per data centre, with perhaps only 2 in the tie-breaker site. The traffic between zk and the applications will be lots of local reads - "who is the primary database ?". Changes to the config will be rare (server rebuilds, etc - ie. planned changes) or caused by server / network / site failure. The interesting thing in my mind is how zookeeper will cope with inter-site link failure - how quickly the remote sites will notice, and how quickly normality can be resumed when the link reappears. I need to get this running in the lab and start pulling out wires. regards, Martin On 8 March 2010 17:39, Patrick Hunt <ph...@apache.org> wrote: > IMO latency is the primary issue you will face, but also keep in mind > reliability w/in a colo. > > Say you have 3 colos (obv can't be 2), if you only have 3 servers, one in > each colo, you will be reliable but clients w/in each colo will have to > connect to a remote colo if the local fails. You will want to prioritize the > local colo given that reads can be serviced entirely local that way. If you > have 7 servers (2-2-3) that would be better - if a local server fails you > have a redundant, if both fail then you go remote. > > You want to keep your writes as few as possible and as small as possible? > Why? Say you have 100ms latency btw colos, let's go through a scenario for a > client in a colo where the local servers are not the leader (zk cluster > leader). > > read: > 1) client reads a znode from local server > 2) local server (usually < 1ms if "in colo" comm) responds in 1ms > > write: > 1) client writes a znode to local server A > 2) A proposes change to the ZK Leader (L) in remote colo > 3) L gets the proposal in 100ms > 4) L proposes the change to all followers > 5) all followers (not exactly, but hopefully) get the proposal in 100ms > 6) followers ack the change > 7) L gets the acks in 100ms > 8) L commits the change (message to all followers) > 9) A gets the commit in 100ms > 10) A responds to client (< 1ms) > > write latency: 100 + 100 + 100 + 100 = 400ms > > Obviously keeping these writes small is also critical. > > Patrick > > > Martin Waite wrote: > >> Hi Ted, >> >> If the links do not work for us for zk, then they are unlikely to work >> with >> any other solution - such as trying to stretch Pacemaker or Red Hat >> Cluster >> with their multicast protocols across the links. >> >> If the links are not good enough, we might have to spend some more money >> to >> fix this. >> >> regards, >> Martin >> >> On 8 March 2010 02:14, Ted Dunning <ted.dunn...@gmail.com> wrote: >> >> If you can stand the latency for updates then zk should work well for >>> you. >>> It is unlikely that you will be able to better than zk does and still >>> maintain correctness. >>> >>> Do note that you can, probalbly bias client to use a local server. That >>> should make things more efficient. >>> >>> Sent from my iPhone >>> >>> >>> On Mar 7, 2010, at 3:00 PM, Mahadev Konar <maha...@yahoo-inc.com> wrote: >>> >>> The inter-site links are a nuisance. We have two data-centres with >>> 100Mb >>> >>>> links which I hope would be good enough for most uses, but we need a 3rd >>>>> site - and currently that only has 2Mb links to the other sites. This >>>>> might >>>>> be a problem. >>>>> >>>>> >>