On 05/03/2010 07:03 AM, Dave Wright wrote:
I've got a situation where I essentially need dynamic cluster
membership, which has been talked about in ZOOKEEPER-107 but doesn't
look like it's going to happen any time soon.
Could you provide some insight into why you need this? Just so we have
addl background, I'm interested to know the use case.
For now, I'm planning on working around this by having a simple
coordinator service on the server nodes that will re-write the configs
and bounce the servers when membership changes. Clients will may get
an error or two and need to reconnect, but that should be handled by
the normal error logic.
Are you expecting all of the servers to change each time, or just
incremental changes (add/remove a single server, vs say move the entire
cluster from 3 hosts a/b/c to x/y/z)
On the client side, I'd really like to dynamically update the server
list w/o having to re-create the entire Zookeeper object. Looking at
the code, it seems like it would be pretty trivial to add
"RemoveServer()/AddServer()" functions for Zookeeper that calls down
to ClientCnxn, where they are just maintained in a list. Of course if
the server being removed is the one currently connected, we'd need to
disconnect, but a simple call to disconnect() seems like it would
resolve that and trigger the automatic re-connection logic.
You would hook this (add/remove) into JMX? That seems like a good option
to provide.
Any chance you could use DNS for this? ie change the mapping for the
hostname from a -> x ip? Since the server a will go down anyway, this
would cause the client to reconnect to b/c (eventually when dns ttl
expires the client would also potentially connect to x).
If this is an option be sure to see (a bit of work to do):
https://issues.apache.org/jira/browse/ZOOKEEPER-328
https://issues.apache.org/jira/browse/ZOOKEEPER-338
You might also look at this patch, we never committed it but it might be
interesting to you:
https://issues.apache.org/jira/browse/ZOOKEEPER-146
The benefit is that you'd only have one place to make the change, esp
given that clients might be down/unreachable when this change occurs.
Clients would have to poll this service whenever they get disconnected
from the ensemble. One drawback of this approach is that the HTTP now
becomes a potential SPOF. (although I guess you could always fall back
to something, or potentially have a list of HTTP hosts to do the lookup,
etc...).
Does anyone see an issue with that approach?
Were I to create the patch, do you think it would be interesting
enough to merge? It seems like that functionality will eventually be
needed for whatever full dynamic server support is eventually
implemented.
It does sound interesting, however once we add something like this it's
hard to change given that we try very hard to maintain b/w
compatibility. If you did the testing and were able to verify I don't
see why we couldn't add it - as it's "optional" in the sense that it
would only be called in the use case you describe. I would feel more
confident if we had more concrete detail on how we intend to do 107 (a
basic functional/design doc that at least reviews all the issues), and
how this would fit in. But I don't see that should necessarily be a
blocker (although others might feel differently).
(fyi it's good to discuss this sort of thing on zookeeper-dev, please
move responses to that list)
Sounds like an useful project, I'm interested to her what others think
about it. Regards,
Patrick