RE: Zookeeper WAN Configuration

Todd Greenwood Mon, 27 Jul 2009 16:00:55 -0700

Flavio, more questions inline:

-----Original Message-----
From: Flavio Junqueira [mailto:f...@yahoo-inc.com] 
Sent: Sunday, July 26, 2009 12:49 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: Zookeeper WAN Configuration

Todd, Answers inline:

On Jul 26, 2009, at 11:05 AM, Todd Greenwood wrote:

> Flavio, thank you for the suggestion.
>
> I have looked at the documention (relevant snippets pasted in  
> below), and looked at the presentations 
> (http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations 
> ),
> but I still have some questions about WAN configuration:
>
> ---------------------------------------------------------------
> WAN
> ----
> A <-> B
> A <-> C
> A <-> D
>
> A is a central processing hub (DC).
> B-D are remote colo edge nodes (PODS).
> Each POD contains (m) ZK Servers with (q) client connections.
> ---------------------------------------------------------------
>
> What are the advantages and disadvantages to co-locating ZK Servers  
> across a WAN? Could you correct my admitedly naïve assumtions here?
>
> 1. ZK Servers within a POD would significantly improve read/write  
> performance within a given POD, v.s. clients within the POD opening  
> connections to the DC.
>

I'm assuming that you're setting the weight of ZooKeeper servers in  
PODs to zero, which means that their votes when ordering updates do  
not count.

[Todd] Correct.

If my assumption is correct, then you should see a significant  
improvement in read performance. I would say that write performance  
wouldn't be very different from clients in PODs opening a direct  
connection to DC.

[Todd] So the Leader, knowing that machine(s) have a voting weight of zero, 
doesn't have to wait for their responses in order to form a quorum vote? Does 
the leader even send voting requests to the weight zero followers?

> 2. ZK Servers within a POD would provide local file transacted  
> storage of writes, obviating the need to write that code ourselves.
>

Yes, local zk servers in PODs receive all updates and process them as  
any other zk server.

> 3. ZK Servers within the POD would be resilient to network  
> connectivity failure between the POD and the DC. Once connectivity  
> re-established, the ZK Servers in the POD would sync with the ZK  
> servers in the DC, and, from the perspective of a client within the  
> POD, everything just worked, and there was no network failure.
>

We want to have servers switching to read-only mode upon network  
partitions, but this is a feature under development. We don't have  
plans for implementing any model of eventual consistency that would  
allow updates even when not being able to form a quorum, and I  
personally believe that it would be a major change, with major  
implications not only to the code base, but also to the semantics of  
our API.

[Todd] What is the current (3.2) behaviour in the case of a network failure 
that prevents connectivity between ZK Servers in a pod? Assuming the pod is 
composed of weight=0 followers...are the clients connected to these zookeeper 
servers still able to read? do they get exceptions on write? do the clients 
hang if it's a synchronous call? 

> 4. A WAN topology of co-located ZK servers in both the DC and (n)  
> PODs would not significantly degrade the performance of the  
> ensemble, provided large blobs of traffic were not being sent across  
> the network.

If the zk servers in the PODs are assigned weight zero, then I don't  
see a reason for having lower performance in the scenario you  
describe. If weights are greater than zero for zk servers in PODs,  
then your performance might be affected, but there are ways of  
assigning weights that do not require receiving votes from all co- 
locations for progress.

[Todd] Great, we'll proceed with hierarchical configuration w/ ZK Servers in 
pods having a voting weight of zero. Could you provide a pointer to a 
configuration that shows this? The docs are a bit lean in this regard...

-Flavio

RE: Zookeeper WAN Configuration

Reply via email to