Yes. That works fine with idempotent operations like creating a non-sequential file.
Of course, it doesn't work with sequential files since you don't know who created any other znodes out there. On Fri, Oct 14, 2011 at 9:39 AM, Jordan Zimmerman <jzimmer...@netflix.com>wrote: > FYI - Curator checks for KeeperException.Code.NODEEXISTS in its retry loop > and just ignores it treating it as a success. I'm not sure if other > libraries do that. So, this is a case that a disconnection can be handled > generically. > > -JZ > > On 10/14/11 7:20 AM, "Fournier, Camille F." <camille.fourn...@gs.com> > wrote: > > >Pretty much all of the Java client wrappers out there in the wild have > >some sort of a retry loop around operations, to make some of this easier > >to deal with. But they don't to my knowledge deal with the situation of > >knowing whether an operation succeeded in the case of a disconnect (it is > >possible to push out a request, and get a disconnect back before you get > >a response for that request so you don't know if your request succeeded > >or failed). So you may end up, for example, writing something twice in > >the case of writing a SEQUENTIAL-type node. For many use cases of > >sequential, this isn't a big deal. > > > >I don't know of anything that handles this in a more subtle way than > >simply retrying. As Ted has mentioned in earlier emails on the subject, > >" You can't just assume that you can retry an operation on Zookeeper and > >get the right result. The correct handling is considerably more subtle. > >Hiding that is not a good thing unless you say right up front that you > >are compromising either expressivity (as does Kept Collections) or > >correctness (as does zkClient)." > > > >It's not clear to me that it is possible to write a generic client to > >"correctly" handle retries on disconnect because what correct means > >varies from use case to use case. One of the challenges I think for > >getting comfortable with using ZK is knowing the correctness bounds for > >your particular use case and understanding the failure scenarios wrt that > >use case and ZK. > > > >C > > > > > >-----Original Message----- > >From: Mike Schilli [mailto:m...@perlmeister.com] > >Sent: Thursday, October 13, 2011 9:27 PM > >To: user@zookeeper.apache.org > >Subject: Re: Locks based on ephemeral nodes - Handling network outage > >correctly > > > >On Wed, 12 Oct 2011, Ted Dunning wrote: > > > >> ZK will tell you when the connection is lost (but not yet expired). > >>When > >> this happens, the application needs to pay attention and pause before > >> continuing to assume it still has the lock. > > > >I think this applies to every write operation in ZooKeeper, which I find > >is a challenge to deal with. > > > >So basically, every time an application writes something to ZooKeeper, > >it needs to check the result, but what to do if it fails? Check if it's > >an error indicating the connection was lost, and try a couple of times > >to reinstate the connection and replay the write? At least, that's what > >the documentation of the Perl Wrapper in Net::ZooKeeper suggests. > > > >Are there best practices around this, or, better yet, a client API that > >actually implements this, so the application doesn't have to implement > >a ZooKeeper wrapper? Something like "retry 3 times with 10 second waits > >in between and fail otherwise"`. > > > >-- -- Mike > > > >Mike Schilli > >m...@perlmeister.com > > > > > > > >> > >> 2011/10/12 Frédéric Jolliton <frede...@jolliton.com> > >> > >>> Hello all, > >>> > >>> There is something that bother me about ephemeral nodes. > >>> > >>> I need to create some locks using Zookeeper. I followed the "official" > >>> recipe, except that I don't use the EPHEMERAL flag. The reason for that > >>> is that I don't know how I should proceed if the connection to > >>>Zookeeper > >>> ensemble is ever lost. But otherwise, everything works nicely. > >>> > >>> The EPHEMERAL flag is useful if the owner of the lock disappear > >>>(exiting > >>> abnormally). From the point of view of the Zookeeper ensemble, the > >>> connection time out (or is closed explicitly), the lock is released. > >>> That's great. > >>> > >>> However, if I lose the connection temporarily (network outage), the > >>> Zookeeper ensemble again see the connection timing out.. but actually > >>> the owner of the lock is still there doing some work on the locked > >>> resource. But the lock is released by Zookeeper anyway. > >>> > >>> How should this case be handled? > >>> > >>> All I can see is that the owner can only verify that the lock was no > >>> longer owned because releasing the lock will give a Session Expired > >>> error (assuming we retry reconnecting while we get a Connection Loss > >>> error) or because of an event sent at some point because the connection > >>> was also closed automatically on the client side by libkeeper (not sure > >>> about this last point). Knowing that the connection expired necessary > >>> mean that the lock was lost but it may be too late. > >>> > >>> I mean that there is a short time lapse where the process that own the > >>> lock have not tried to release it yet and thus don't know it lost it, > >>> and another process was able to acquire it too in the meantime. This is > >>> a big problem. > >>> > >>> That's why I avoid the EPHEMERAL flag for now, and plan to rely on > >>> periodic cleaning task to drop locks no longer owned by some process (a > >>> task which is not trivial either.) > >>> > >>> I would appreciate any tips to handle such situation in a better way. > >>> What is your experience in such cases? > >>> > >>> Regards, > >>> > >>> -- > >>> Frédéric Jolliton > >>> Outscale SAS > >>> > >>> > >> > > > >