On 05/03/2010 07:03 AM, Dave Wright wrote:
I've got a situation where I essentially need dynamic cluster
membership, which has been talked about in ZOOKEEPER-107 but doesn't
look like it's going to happen any time soon.


Could you provide some insight into why you need this? Just so we have addl background, I'm interested to know the use case.

For now, I'm planning on working around this by having a simple
coordinator service on the server nodes that will re-write the configs
and bounce the servers when membership changes. Clients will may get
an error or two and need to reconnect, but that should be handled by
the normal error logic.


Are you expecting all of the servers to change each time, or just incremental changes (add/remove a single server, vs say move the entire cluster from 3 hosts a/b/c to x/y/z)

On the client side, I'd really like to dynamically update the server
list w/o having to re-create the entire Zookeeper object. Looking at
the code, it seems like it would be pretty trivial to add
"RemoveServer()/AddServer()" functions for Zookeeper that calls down
to ClientCnxn, where they are just maintained in a list. Of course if
the server being removed is the one currently connected, we'd need to
disconnect, but a simple call to disconnect() seems like it would
resolve that and trigger the automatic re-connection logic.


You would hook this (add/remove) into JMX? That seems like a good option to provide.

Any chance you could use DNS for this? ie change the mapping for the hostname from a -> x ip? Since the server a will go down anyway, this would cause the client to reconnect to b/c (eventually when dns ttl expires the client would also potentially connect to x).

If this is an option be sure to see (a bit of work to do):
https://issues.apache.org/jira/browse/ZOOKEEPER-328
https://issues.apache.org/jira/browse/ZOOKEEPER-338

You might also look at this patch, we never committed it but it might be interesting to you:
https://issues.apache.org/jira/browse/ZOOKEEPER-146

The benefit is that you'd only have one place to make the change, esp given that clients might be down/unreachable when this change occurs. Clients would have to poll this service whenever they get disconnected from the ensemble. One drawback of this approach is that the HTTP now becomes a potential SPOF. (although I guess you could always fall back to something, or potentially have a list of HTTP hosts to do the lookup, etc...).

Does anyone see an issue with that approach?
Were I to create the patch, do you think it would be interesting
enough to merge? It seems like that functionality will eventually be
needed for whatever full dynamic server support is eventually
implemented.

It does sound interesting, however once we add something like this it's hard to change given that we try very hard to maintain b/w compatibility. If you did the testing and were able to verify I don't see why we couldn't add it - as it's "optional" in the sense that it would only be called in the use case you describe. I would feel more confident if we had more concrete detail on how we intend to do 107 (a basic functional/design doc that at least reviews all the issues), and how this would fit in. But I don't see that should necessarily be a blocker (although others might feel differently).

(fyi it's good to discuss this sort of thing on zookeeper-dev, please move responses to that list)

Sounds like an useful project, I'm interested to her what others think about it. Regards,

Patrick

Reply via email to