Fangmin Lv created ZOOKEEPER-3619:
-------------------------------------

             Summary: Implement server side semaphore API to improve the 
efficiency and throughput of coordination 
                 Key: ZOOKEEPER-3619
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3619
             Project: ZooKeeper
          Issue Type: New Feature
          Components: server
    Affects Versions: 3.6.0
            Reporter: Fangmin Lv
            Assignee: Fangmin Lv
             Fix For: 3.6.0


The design principle of ZK API is simple, flexible and general, it can meets 
different scenarios from coordination, health member track, meta store, etc. 

But there are some cost of this general design, which makes heavy and 
inefficient client code for recipes like distributed and semaphore, etc.

Currently, the general client side semaphore implementation without waiting 
time are:
 # client A create sequential and ephemeral node N-1
 # client B create sequential and ephemeral node N-2
 # client A and B query all children and see if its holding the lock node with 
the smallest sequential id 
 # since client A has smaller sequential id, its the semaphore owner (assume 
semaphore value is 1)
 # client B will delete the node, close the session, and probably try again 
later from step 2

All the contenders will issue 4 write (create session, create lock, delete 
lock, close session) and 1 read (get children), which are pretty heavy and not 
scale well.

We actually hit this issue internally for one heavy semaphore use case, and we 
have to create dozens of ensembles to support their traffic.

To make the semaphore recipe more efficient, we can move the semaphore 
implementation to server side, where leader has all the context about who'll 
win the semaphore/lock during txn preparation time, do short circuit and fail 
the contender directly without proposing and committing those create/delete 
lock transactions.

To implement this, we need to add new semaphore API, which suppose to replace 
client side lock, leader election (semaphore value 1), and general semaphore 
use cases.

We started to design and implement it recently, it will based on another big 
improvement we've almost finished and will soon upstream it in ZOOKEEPER-3594 
to skip proposing requests with error transactions.

Meanwhile, we'd like to hear some early feedback from the community about this 
feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to