On 08/05/2010 06:31 PM, Jonathan Holloway wrote:
Hi all,

I'm looking at using Zookeeper for distributed sequence number generation.
  What's the best way to do this currently?  Is there a particular recipe
available for this?

My so far involve:
a) Creating a node with PERSISTENT_SEQUENTIAL then deleting it - this gives
me the monotonically increasing number, but the sequence number isn't
contiguous
b) Storing the sequence number in the data portion of a persistent node -
then updating this (using the version number - aka optimistic locking).  The
problem with this is that under high load I'm assuming there'll be a lot of
contention and hence failures with regards to updates.

What are your thoughts on the above?

Many thanks,
Jon.

I just ran into this exact situation, and handled it like so:

I wrote a library that uses the option (b) you described above. Only instead of requesting a single sequence number, you request a block of them at a time from Zookeeper, and then locally use them up one by one from the block you retrieved. Retrieving by block (e.g., by blocks of 10000 at a time) eliminates the contention issue.

Then, if you're finished assigning ID's from that block, but still have a bunch of ID's left in the block, the library has another function to "push back" the unused ID's. They'll then get pulled again in the next block retrieval.

We don't actually have this code running in production yet, so I can't vouch for how well it works. But the design was reviewed and given the thumbs up by the core developers on the team, and the implementation passes all my unit tests.

HTH. Feel free to email back with specific questions if you'd like more details.

DR

Reply via email to