Correct. So you can't actually do a correct retry logic for some error conditions.
On Fri, Oct 14, 2011 at 9:44 AM, Jordan Zimmerman <jzimmer...@netflix.com>wrote: > True. But, it wouldn't be possible to get KeeperException.Code.NODEEXISTS > for sequential files, right? > > -JZ > > On 10/14/11 9:41 AM, "Ted Dunning" <ted.dunn...@gmail.com> wrote: > > >Yes. That works fine with idempotent operations like creating a > >non-sequential file. > > > >Of course, it doesn't work with sequential files since you don't know who > >created any other znodes out there. > > > >On Fri, Oct 14, 2011 at 9:39 AM, Jordan Zimmerman > ><jzimmer...@netflix.com>wrote: > > > >> FYI - Curator checks for KeeperException.Code.NODEEXISTS in its retry > >>loop > >> and just ignores it treating it as a success. I'm not sure if other > >> libraries do that. So, this is a case that a disconnection can be > >>handled > >> generically. > >> > >> -JZ > >> > >> On 10/14/11 7:20 AM, "Fournier, Camille F." <camille.fourn...@gs.com> > >> wrote: > >> > >> >Pretty much all of the Java client wrappers out there in the wild have > >> >some sort of a retry loop around operations, to make some of this > >>easier > >> >to deal with. But they don't to my knowledge deal with the situation of > >> >knowing whether an operation succeeded in the case of a disconnect (it > >>is > >> >possible to push out a request, and get a disconnect back before you > >>get > >> >a response for that request so you don't know if your request succeeded > >> >or failed). So you may end up, for example, writing something twice in > >> >the case of writing a SEQUENTIAL-type node. For many use cases of > >> >sequential, this isn't a big deal. > >> > > >> >I don't know of anything that handles this in a more subtle way than > >> >simply retrying. As Ted has mentioned in earlier emails on the subject, > >> >" You can't just assume that you can retry an operation on Zookeeper > >>and > >> >get the right result. The correct handling is considerably more > >>subtle. > >> >Hiding that is not a good thing unless you say right up front that you > >> >are compromising either expressivity (as does Kept Collections) or > >> >correctness (as does zkClient)." > >> > > >> >It's not clear to me that it is possible to write a generic client to > >> >"correctly" handle retries on disconnect because what correct means > >> >varies from use case to use case. One of the challenges I think for > >> >getting comfortable with using ZK is knowing the correctness bounds for > >> >your particular use case and understanding the failure scenarios wrt > >>that > >> >use case and ZK. > >> > > >> >C > >> > > >> > > >> >-----Original Message----- > >> >From: Mike Schilli [mailto:m...@perlmeister.com] > >> >Sent: Thursday, October 13, 2011 9:27 PM > >> >To: user@zookeeper.apache.org > >> >Subject: Re: Locks based on ephemeral nodes - Handling network outage > >> >correctly > >> > > >> >On Wed, 12 Oct 2011, Ted Dunning wrote: > >> > > >> >> ZK will tell you when the connection is lost (but not yet expired). > >> >>When > >> >> this happens, the application needs to pay attention and pause before > >> >> continuing to assume it still has the lock. > >> > > >> >I think this applies to every write operation in ZooKeeper, which I > >>find > >> >is a challenge to deal with. > >> > > >> >So basically, every time an application writes something to ZooKeeper, > >> >it needs to check the result, but what to do if it fails? Check if it's > >> >an error indicating the connection was lost, and try a couple of times > >> >to reinstate the connection and replay the write? At least, that's what > >> >the documentation of the Perl Wrapper in Net::ZooKeeper suggests. > >> > > >> >Are there best practices around this, or, better yet, a client API that > >> >actually implements this, so the application doesn't have to implement > >> >a ZooKeeper wrapper? Something like "retry 3 times with 10 second waits > >> >in between and fail otherwise"`. > >> > > >> >-- -- Mike > >> > > >> >Mike Schilli > >> >m...@perlmeister.com > >> > > >> > > >> > > >> >> > >> >> 2011/10/12 Frédéric Jolliton <frede...@jolliton.com> > >> >> > >> >>> Hello all, > >> >>> > >> >>> There is something that bother me about ephemeral nodes. > >> >>> > >> >>> I need to create some locks using Zookeeper. I followed the > >>"official" > >> >>> recipe, except that I don't use the EPHEMERAL flag. The reason for > >>that > >> >>> is that I don't know how I should proceed if the connection to > >> >>>Zookeeper > >> >>> ensemble is ever lost. But otherwise, everything works nicely. > >> >>> > >> >>> The EPHEMERAL flag is useful if the owner of the lock disappear > >> >>>(exiting > >> >>> abnormally). From the point of view of the Zookeeper ensemble, the > >> >>> connection time out (or is closed explicitly), the lock is released. > >> >>> That's great. > >> >>> > >> >>> However, if I lose the connection temporarily (network outage), the > >> >>> Zookeeper ensemble again see the connection timing out.. but > >>actually > >> >>> the owner of the lock is still there doing some work on the locked > >> >>> resource. But the lock is released by Zookeeper anyway. > >> >>> > >> >>> How should this case be handled? > >> >>> > >> >>> All I can see is that the owner can only verify that the lock was no > >> >>> longer owned because releasing the lock will give a Session Expired > >> >>> error (assuming we retry reconnecting while we get a Connection Loss > >> >>> error) or because of an event sent at some point because the > >>connection > >> >>> was also closed automatically on the client side by libkeeper (not > >>sure > >> >>> about this last point). Knowing that the connection expired > >>necessary > >> >>> mean that the lock was lost but it may be too late. > >> >>> > >> >>> I mean that there is a short time lapse where the process that own > >>the > >> >>> lock have not tried to release it yet and thus don't know it lost > >>it, > >> >>> and another process was able to acquire it too in the meantime. > >>This is > >> >>> a big problem. > >> >>> > >> >>> That's why I avoid the EPHEMERAL flag for now, and plan to rely on > >> >>> periodic cleaning task to drop locks no longer owned by some > >>process (a > >> >>> task which is not trivial either.) > >> >>> > >> >>> I would appreciate any tips to handle such situation in a better > >>way. > >> >>> What is your experience in such cases? > >> >>> > >> >>> Regards, > >> >>> > >> >>> -- > >> >>> Frédéric Jolliton > >> >>> Outscale SAS > >> >>> > >> >>> > >> >> > >> > > >> > >> > >