Hi Patrick, Thanks for the info - the Fallacies link especially. As you might have guessed, I am one the programmers new to distributed computing who is very much in danger of messing things up.
I am going to have to knuckle down and do some experiments. Thankfully, I don't think my requirements will stretch Zookeeper even if I take a heavy handed approach. regards, Martin On 24 February 2010 16:53, Patrick Hunt <ph...@apache.org> wrote: > > Martin Waite wrote: > >> The watch mechanism is a new feature for me. This gives me a delayed >> notification that something changed in the lock directory, and so is the >> earliest time that it makes sense to retry my lock acquistion. However, >> given the time-delay in getting the notification, the freed lock might >> have >> be acquired by someone else before I get there. In which case, I might >> as >> well just keep trying to acquire locks at random until my time budget is >> exhausted and not bother with the watch ? >> >> > I don't see the benefit of what Mahadev/Ted are suggesting vs Martin's > original proposal. Perhaps I'm missing something, please correct me if I'm > wrong but it seems to me that you want two "lists"; a list of resources and > a list of locks. Resources might be added or removed dynamically over time > (assuming they are not known a priori), locks are short lived and exclusive. > To me this suggests: > > /resources/resource_### (ephem? owned by the resource itself) > /locks/resource_### (ephem) > > where the available resources are managed by adding/removing from > /resources. Anyone interested in locking an explicit resource attempts to > create an ephemeral node in /locks with the same ### as they resource they > want access to. If interested in just getting "any" resource then you would > getchildren(/resources) and getchildren(/locks) and attempt to lock anything > not in the intersection (avail). This could be done efficiently since > resources won't change much, just cache the results of getchildren and set a > watch at the same time. To lock a resource randomize "avail" and attempt to > lock each in turn. If all avail fail to acq the lock, then have some random > holdoff time, then re-getchildren(locks) and start over. > > Distributed computing is inherently "delayed" http://bit.ly/chhFrS right? > ;-) The benefit of the watch is typically that it minimizes load on the > service - notification vs polling. > > > Are watches triggered as soon as the primary controller applies a change >> to >> an object - or are they delivered whenever the client's local zk instance >> replicates the change at some later time ? >> >> > They are not synchonous in the sense you mean. You are guaranteed that all > clients see all changes in the same order, but not > synchronously/instantaneously. > > This stackoverflow page has some good detail, see Ben's comment here: > http://bit.ly/aaMzHY > > > Is there a feature to introduce deliberate lag between the primary and its >> replicas in the ensemble - for development purposes ? That could be >> useful >> for exposing latency assumptions. >> >> > No feature but it does sound interesting. Are there any tools that allow > one to setup "slow pipes" ala stunnel but here for latency not encryp? I > believe freebsd has this feature at the os (firewall?) level, I don't know > if linux does. > > Patrick > > > >> On 24 February 2010 06:05, Ted Dunning <ted.dunn...@gmail.com> wrote: >> >> You have to be careful there of race conditions. ZK's slightly >>> surprising >>> API makes it pretty easy to get this right, however. >>> >>> The correct way to do what you suggest is to read the list of children in >>> the locks directory and put a watch on the directory at the same time. >>> If >>> the number of locks equals the number of resources, you wait. If it is >>> less, you can randomly pick one of the apparently unlocked resources at >>> random. If you fail, start again by checking the number of resources. >>> >>> On Tue, Feb 23, 2010 at 9:09 PM, Martin Waite <waite....@googlemail.com >>> >>>> wrote: >>>> I guess another optimisation might be to count the number of locks held >>>> first: if the count equals the number of resources, try again later. >>>> >>> But >>> >>>> I >>>> suppose that might require a sync call first to ensure that zk instance >>>> >>> my >>> >>>> client is connected to is up to date. >>>> >>>> >>> >>> -- >>> Ted Dunning, CTO >>> DeepDyve >>> >>> >>