Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Patrick Hunt
Another benefit of ZOOKEEPER-146 - we could use this for some sort of 
load balancing amongst the ensemble members. The first version could 
return a static list, however I can see where the HTTPD might be updated 
to monitor the load on the servers/ensemble and prioritize the list for 
each client request...


Patrick

On 05/03/2010 09:34 AM, Patrick Hunt wrote:


On 05/03/2010 07:03 AM, Dave Wright wrote:

I've got a situation where I essentially need dynamic cluster
membership, which has been talked about in ZOOKEEPER-107 but doesn't
look like it's going to happen any time soon.



Could you provide some insight into why you need this? Just so we have
addl background, I'm interested to know the use case.


For now, I'm planning on working around this by having a simple
coordinator service on the server nodes that will re-write the configs
and bounce the servers when membership changes. Clients will may get
an error or two and need to reconnect, but that should be handled by
the normal error logic.



Are you expecting all of the servers to change each time, or just
incremental changes (add/remove a single server, vs say move the entire
cluster from 3 hosts a/b/c to x/y/z)


On the client side, I'd really like to dynamically update the server
list w/o having to re-create the entire Zookeeper object. Looking at
the code, it seems like it would be pretty trivial to add
RemoveServer()/AddServer() functions for Zookeeper that calls down
to ClientCnxn, where they are just maintained in a list. Of course if
the server being removed is the one currently connected, we'd need to
disconnect, but a simple call to disconnect() seems like it would
resolve that and trigger the automatic re-connection logic.



You would hook this (add/remove) into JMX? That seems like a good option
to provide.

Any chance you could use DNS for this? ie change the mapping for the
hostname from a - x ip? Since the server a will go down anyway, this
would cause the client to reconnect to b/c (eventually when dns ttl
expires the client would also potentially connect to x).

If this is an option be sure to see (a bit of work to do):
https://issues.apache.org/jira/browse/ZOOKEEPER-328
https://issues.apache.org/jira/browse/ZOOKEEPER-338

You might also look at this patch, we never committed it but it might be
interesting to you:
https://issues.apache.org/jira/browse/ZOOKEEPER-146

The benefit is that you'd only have one place to make the change, esp
given that clients might be down/unreachable when this change occurs.
Clients would have to poll this service whenever they get disconnected
from the ensemble. One drawback of this approach is that the HTTP now
becomes a potential SPOF. (although I guess you could always fall back
to something, or potentially have a list of HTTP hosts to do the lookup,
etc...).


Does anyone see an issue with that approach?
Were I to create the patch, do you think it would be interesting
enough to merge? It seems like that functionality will eventually be
needed for whatever full dynamic server support is eventually
implemented.


It does sound interesting, however once we add something like this it's
hard to change given that we try very hard to maintain b/w
compatibility. If you did the testing and were able to verify I don't
see why we couldn't add it - as it's optional in the sense that it
would only be called in the use case you describe. I would feel more
confident if we had more concrete detail on how we intend to do 107 (a
basic functional/design doc that at least reviews all the issues), and
how this would fit in. But I don't see that should necessarily be a
blocker (although others might feel differently).

(fyi it's good to discuss this sort of thing on zookeeper-dev, please
move responses to that list)

Sounds like an useful project, I'm interested to her what others think
about it. Regards,

Patrick


Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Dave Wright
 Could you provide some insight into why you need this? Just so we have addl
 background, I'm interested to know the use case.

Sure, we're building a clustered application that will use zookeeper
as part of it. We need to manage ZK ourself. The cluster running the
app  ZK may change over time (nodes added or removed) and we need to
keep ZK itself in-sync with any changes. They won't be common, but we
can't shut the app down to make the changes, it needs to be
transparent.


 Are you expecting all of the servers to change each time, or just
 incremental changes (add/remove a single server, vs say move the entire
 cluster from 3 hosts a/b/c to x/y/z)

I'd expect a small number of changes at any time - a few nodes being
added, a few nodes being removed. Most of the nodes will stay the
same.


 Any chance you could use DNS for this? ie change the mapping for the
 hostname from a - x ip? Since the server a will go down anyway, this would
 cause the client to reconnect to b/c (eventually when dns ttl expires the
 client would also potentially connect to x).
 https://issues.apache.org/jira/browse/ZOOKEEPER-328
 https://issues.apache.org/jira/browse/ZOOKEEPER-338


Well, there are a lot of issues with DNS (including security  cache)
so I'd prefer to avoid it. Also, the real issue is the # of servers
are changing, not just their IP.
Although we probably wouldn't use it, I do think it would be nice to
support a single hostname for the ZK cluster with one A records for
each member, and have the ZK client handle resolving that properly
each time it connects.


 You might also look at this patch, we never committed it but it might be
 interesting to you:
 https://issues.apache.org/jira/browse/ZOOKEEPER-146

 The benefit is that you'd only have one place to make the change, esp given
 that clients might be down/unreachable when this change occurs. Clients
 would have to poll this service whenever they get disconnected from the
 ensemble. One drawback of this approach is that the HTTP now becomes a
 potential SPOF. (although I guess you could always fall back to something,
 or potentially have a list of HTTP hosts to do the lookup, etc...).

Well, that just handles distribution of the list (which isn't really
our problem), it doesn't help with restarting the ZK client when the
list changes - it only pulls the list once, so you still have to
completely shutdown and restart the ZK client.


 It does sound interesting, however once we add something like this it's hard
 to change given that we try very hard to maintain b/w compatibility. If you
 did the testing and were able to verify I don't see why we couldn't add it -
 as it's optional in the sense that it would only be called in the use case
 you describe. I would feel more confident if we had more concrete detail on
 how we intend to do 107 (a basic functional/design doc that at least reviews
 all the issues), and how this would fit in. But I don't see that should
 necessarily be a blocker (although others might feel differently).

Have you ever considered adding features like this via a protected
interface (i.e. the are useful but aren't fully standardized, so if a
client wants to use it they can sub-class ZK and make them public)?

The ability to dynamically modify the server list on the client side
seems like it would be required no matter what approach were taken to
dynamic clusters.

-Dave Wright


Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Gustavo Niemeyer
 The ability to dynamically modify the server list on the client side
 seems like it would be required no matter what approach were taken to
 dynamic clusters.

 Hasn't come up before, but yes I agree it's a useful feature.

I agree with Dave that this is quite important for a truly dynamic
membership experience.  I think I improperly imagined the two as being
inherently part of the same problem before, but I see they could be
split into different ones now that you mention it.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/identi.ca
http://niemeyer.net/twitter


Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Dave Wright
 Well, that just handles distribution of the list (which isn't really
 our problem), it doesn't help with restarting the ZK client when the
 list changes - it only pulls the list once, so you still have to
 completely shutdown and restart the ZK client.


 Well the old server is being shutdown right? If the client were connected to
 that server this would force the client to reconnect to another server, what
 I was suggesting is that the client would ping the server lookup service
 as part of this. (so lookup on every disconnect say)

Perhaps we should clarify what you mean by client (..would ping..).
If you mean the ZK client library, then that would make sense - rather
than use a static list of servers, each time it was disconnected it
would refresh it's list and pick one.
I took it to mean the client application (using the ZK library). The
issue is that the client application has no way to tell the ZK client
lib to use a different list of servers, other than a complete teardown
of the ZK object  session, which I'm trying to avoid.


 Hasn't come up before, but yes I agree it's a useful feature.

Ok, thanks. We don't have a specific ETA to implement it, I just
wanted to explore the option a bit before we finalized some aspects of
our design. Should we do the work I'll submit matches for the Java and
C client.

-Dave


Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Patrick Hunt



On 05/03/2010 11:29 AM, Dave Wright wrote:

Well, that just handles distribution of the list (which isn't really
our problem), it doesn't help with restarting the ZK client when the
list changes - it only pulls the list once, so you still have to
completely shutdown and restart the ZK client.



Well the old server is being shutdown right? If the client were connected to
that server this would force the client to reconnect to another server, what
I was suggesting is that the client would ping the server lookup service
as part of this. (so lookup on every disconnect say)


Perhaps we should clarify what you mean by client (..would ping..).
If you mean the ZK client library, then that would make sense - rather
than use a static list of servers, each time it was disconnected it
would refresh it's list and pick one.
I took it to mean the client application (using the ZK library). The
issue is that the client application has no way to tell the ZK client
lib to use a different list of servers, other than a complete teardown
of the ZK object  session, which I'm trying to avoid.



Yes, that's what I meant - we could update the ZK client lib to do this. 
It would be invisible to the client application (your code) itself.




Hasn't come up before, but yes I agree it's a useful feature.


Ok, thanks. We don't have a specific ETA to implement it, I just
wanted to explore the option a bit before we finalized some aspects of
our design. Should we do the work I'll submit matches for the Java and
C client.


That would be great.

Patrick



Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Patrick Hunt



On 05/03/2010 12:07 PM, Dave Wright wrote:


Yes, that's what I meant - we could update the ZK client lib to do this. It
would be invisible to the client application (your code) itself.


I don't think that's a bad idea, and the general approach in ZK-146 of
using an interface that gets called to retrieve the list of hosts
seems good (so that you aren't tied to a specific implementation of
hosts lists, be it HTTP or DNS). That said, I don't think the actual
implementation of ZK-146 is a good solution, since it only resolves
the host list once. An implementation that resolved it on each
disconnection would be better but require deeper changes to the
ClientCnxn.


You could update 146 as appropriate, handling changes to the ensemble 
members wasn't an original goal. Notice there was some discussion on how 
to do this in a way that would be as flexible as possible going forward, 
and so that we don't end up with all kinds of constructors (etc...) on 
top of ZK client for the different schemes. That is still a concern, 
something that we should come to agreement on before implementation is 
started I mean.


Patrick


Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Dave Wright

 Yes, that's what I meant - we could update the ZK client lib to do this. It
 would be invisible to the client application (your code) itself.

I don't think that's a bad idea, and the general approach in ZK-146 of
using an interface that gets called to retrieve the list of hosts
seems good (so that you aren't tied to a specific implementation of
hosts lists, be it HTTP or DNS). That said, I don't think the actual
implementation of ZK-146 is a good solution, since it only resolves
the host list once. An implementation that resolved it on each
disconnection would be better but require deeper changes to the
ClientCnxn.

-Dave


Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Henry Robinson
On 3 May 2010 16:40, Dave Wright wrig...@gmail.com wrote:

  Should this be a znode in the privileged namespace?
 

 I think having a znode for the current cluster members is part of the
 ZOOKEEPER-107 proposal, with the idea being that you could get/set the
 membership just by writing to that node. On the client side, you could
 watch that znode and update your server list when it changes.



This is tricky: what happens if the server your client is connected to is
decommissioned by a view change, and you are unable to locate another server
to connect to because other view changes committed while you are
reconnecting have removed all the servers you knew about. We'd need to make
sure that watches on this znode were fired before a view change, but it's
hard to know how to avoid having to wait for a session timeout before a
client that might just be migrating servers reappears in order to make sure
it sees the veiw change.

Even then, the problem of 'locating' the cluster still exists in the case
that there are no clients connected to tell anyone about it.

Henry


-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Dave Wright
 Should this be a znode in the privileged namespace?


I think having a znode for the current cluster members is part of the
ZOOKEEPER-107 proposal, with the idea being that you could get/set the
membership just by writing to that node. On the client side, you could
watch that znode and update your server list when it changes. I think
it would be a great solution, but I was thinking the ability to
manually manage the server list would be useful in the interim, or if
ZK-107 takes a different path.

-Dave


Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Dave Wright


 This is tricky: what happens if the server your client is connected to is
 decommissioned by a view change, and you are unable to locate another server
 to connect to because other view changes committed while you are
 reconnecting have removed all the servers you knew about. We'd need to make
 sure that watches on this znode were fired before a view change, but it's
 hard to know how to avoid having to wait for a session timeout before a
 client that might just be migrating servers reappears in order to make sure
 it sees the veiw change.

 Even then, the problem of 'locating' the cluster still exists in the case
 that there are no clients connected to tell anyone about it.

Yes, this doesn't completely solve two issues:
1. Bootstrapping the cluster itself  clients
2. Major cluster reconfiguration (e.g. switching out every node before
clients can pickup the changes).

That said, I think it gets close and could still be useful.
For #1, you could simply require that the initial servers in the
cluster be manually configured, then servers could be added and
removed as needed. New servers would just need the address of one
other server to join and get the full server list. For clients,
you'd have a similar situation - you still need a way to pass an
initial server list (or at least 1 valid server) in to the client, but
that could be via HTTP, DNS, or manual list, then the clients
themselves could stay in sync with changes.
For #2, you could simply document that there are limits to how fast
you want to change the cluster, and that if you make too many changes
too fast, clients or servers may not pick up the change fast enough
and need to be restarted. In reality I don't think this will be much
of an issue - as long as at least one server from the starting state
stays up until everyone else gets reconnected, everyone should
eventually find that node and get the full server list.

-Dave