Re: Too many "KeeperErrorCode = Session moved" messages

2010-08-07 Thread Patrick Hunt
I suspect this is a bug with the sync call and session moved (the code 
path for sync is a bit special). Please enter a JIRA for this. Thanks.


Patrick

On 08/05/2010 01:20 PM, Vishal K wrote:

Hi All,

I am seeing a lot of these messages in our application. I would like to know
if I am doing something wrong or this is a ZK bug.

Setup:
- Server environment:zookeeper.version=3.3.0-925362
- 3 node cluster
- Each node has few clients that connect to the local server using 127.0.0.1
as the host IP.
- The application first forms a ZK cluster. Once the ZK cluster is formed,
each node establish sessions with local ZK servers. The clients do not know
about remote server so sessions are always with the local server.

As soon as ZK clients connected to their respective follower, the ZK leader
starts spitting the following messages:

2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,748 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x9 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,755 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0xb zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,795 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x10 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,850 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa90001
type:sync: cxid:0x1 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,910 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x1b zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,920 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x20 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,019 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x29 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,030 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x2c zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,035 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x2e zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,065 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x33 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:38,840 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa90001
type:sync: cxid:0x4 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
20

These sessions were established on the follower:
2010-07-01 08:59:09,890 - INFO  [CommitProcessor:0:nioserverc...@1431] -
Established session 0x298d3b1fa9 with negotiated timeout 9000 for client
/127.0.0.1:50773
2010-07-01 08:59:09,890 - INFO
[SvaDefaultBLC-SendThread(localhost.localdom:2181):clientcnxn$sendthr...@701]
- Session establishment complete on server localhost.localdom/127.0.0.1:2181,
sessionid = 0x298d3b1fa9, negotiated timeout = 9000


The server is spitting out these messages for every session that it does not
own  (session established by clients with followers). The messages are
always seen for a sync request.
No other issues are seen with the cluster. I am wondering what would be the
cause of this problem? Looking at PrepRequestProcessor, it seems like this
message is printed when the owner of the reques

Re: Sequence Number Generation With Zookeeper

2010-08-07 Thread Me
Hi all,

we have something implementing the optimistic concurrency approach to
sequence generation that we've been running in production for some time now.
We don't see a huge amount of contention over the sequence counters as the
nature of our app lends itself well to partitioned keys. Initially, we coded
up the simplest thing we thought could work and deployed it, figuring that
we'd have plenty of scope for improvement once we saw it running with real
load. However, to date its been ticking over so well we've not really had
cause to spend any further effort on it.

There's plenty of scope for improvement though, two of the things we had
thought we would need to do sooner rather than later are implement an
exponential backoff scheme (like Ted describes) when there is contention
over a given counter, and to add a more performant network interface than
HTTP. Like I say though, this just hasn't been a high enough priority for us
yet.

Anyway, we've been meaning to open source this for a while now, and prompted
by this thread, I just spent an afternoon tidying up a little and pushing to
github. Its at http://github.com/talisplatform/H1  and any feedback would be
gratefully received.

Cheers,
Sam

On 7 August 2010 03:40, Ted Dunning  wrote:

> Tell him that we will all look over your code so he gets immediate free
> consulting.
>
> On Fri, Aug 6, 2010 at 7:39 PM, David Rosenstrauch  >wrote:
>
> > I'll run it by my boss next week.
> >
> > DR
> >
> >
> > On 08/06/2010 07:30 PM, Mahadev Konar wrote:
> >
> >> Hi David,
> >>  I think it would be really useful. It would be very helpful for someone
> >> looking for geenrating unique tokens/generations ids ( I can think of
> >> plenty
> >> of applications for this).
> >>
> >> Please do consider contributing it back to the community!
> >>
> >> Thanks
> >> mahadev
> >>
> >>
> >> On 8/6/10 7:10 AM, "David Rosenstrauch"  wrote:
> >>
> >>  Perhaps.  I'd have to ask my boss for permission to release the code.
> >>>
> >>> Is this something that would be interesting/useful to other people?  If
> >>> so, I can ask about it.
> >>>
> >>> DR
> >>>
> >>> On 08/05/2010 11:02 PM, Jonathan Holloway wrote:
> >>>
>  Hi David,
> 
>  We did discuss potentially doing this as well.  It would be nice to
> get
>  some
>  recipes for Zookeeper done for this area, if people think it's useful.
>   Were
>  you thinking of submitting this back as a recipe, if not then I could
>  potentially work on such a recipe instead.
> 
>  Many thanks,
>  Jon.
> 
> 
>   I just ran into this exact situation, and handled it like so:
> >
> > I wrote a library that uses the option (b) you described above.  Only
> > instead of requesting a single sequence number, you request a block
> of
> > them
> > at a time from Zookeeper, and then locally use them up one by one
> from
> > the
> > block you retrieved.  Retrieving by block (e.g., by blocks of 1
> at
> > a
> > time) eliminates the contention issue.
> >
> > Then, if you're finished assigning ID's from that block, but still
> have
> > a
> > bunch of ID's left in the block, the library has another function to
> > "push
> > back" the unused ID's.  They'll then get pulled again in the next
> block
> > retrieval.
> >
> > We don't actually have this code running in production yet, so I
> can't
> > vouch for how well it works.  But the design was reviewed and given
> the
> > thumbs up by the core developers on the team, and the implementation
> > passes
> > all my unit tests.
> >
> > HTH.  Feel free to email back with specific questions if you'd like
> > more
> > details.
> >
> > DR
> >
> 
>