[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-423:
-------------------------------------

    Description: 
When building the distributed queue for my tutorial blog post, it was pointed 
out to me that there's a serious inefficiency here. 

Informally, the items in the queue are created as sequential nodes. For a 
'dequeue' call, all items are retrieved and sorted by name by the client in 
order to find the name of the next item to try and take. This costs O( n ) 
bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this doesn't 
scale very well. 

If the servers were able to maintain a data structure that allowed them to 
efficiently retrieve the children of a node in order of the zxid that created 
them this would make successful dequeue operations O( 1 ) at the cost of O( n ) 
memory on the server (to maintain, e.g. a singly-linked list as a queue). This 
is a win if it is generally true that clients only want the first child in 
creation order, rather than the whole set. 

We could expose this to the client via this API: getFirstChild(handle, path, 
name_buffer, watcher) which would have much the same semantics as getChildren, 
but only return one znode name. 

Sequential nodes would still allow the ordering of znodes to be made explicitly 
available to the client in one RPC should it need it. Although: since this 
ordering would now be available cheaply for every set of children, it's not 
completely clear that there would be that many use cases left for sequential 
nodes if this API was augmented with a getChildrenInCreationOrder call. 
However, that's for a different discussion. 

A halfway-house alternative with more flexibility is to add an 'order' 
parameter to getFirstChild and have the server compute the first child 
according to the requested order (creation time, update time, lexicographical 
order). This saves bandwidth at the expense of increased server load, although 
servers can be implemented to spend memory on pre-computing commonly requested 
orders. I am only in favour of this approach if servers maintain a 
data-structure for every possible order, and then the memory implications need 
careful consideration.

[edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.]

  was:
When building the distributed queue for my tutorial blog post, it was pointed 
out to me that there's a serious inefficiency here. 

Informally, the items in the queue are created as sequential nodes. For a 
'dequeue' call, all items are retrieved and sorted by name by the client in 
order to find the name of the next item to try and take. This costs O(n) 
bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this doesn't 
scale very well. 

If the servers were able to maintain a data structure that allowed them to 
efficiently retrieve the children of a node in order of the zxid that created 
them this would make successful dequeue operations O(1) at the cost of O(n) 
memory on the server (to maintain, e.g. a singly-linked list as a queue). This 
is a win if it is generally true that clients only want the first child in 
creation order, rather than the whole set. 

We could expose this to the client via this API: getFirstChild(handle, path, 
name_buffer, watcher) which would have much the same semantics as getChildren, 
but only return one znode name. 

Sequential nodes would still allow the ordering of znodes to be made explicitly 
available to the client in one RPC should it need it. Although: since this 
ordering would now be available cheaply for every set of children, it's not 
completely clear that there would be that many use cases left for sequential 
nodes if this API was augmented with a getChildrenInCreationOrder call. 
However, that's for a different discussion. 

A halfway-house alternative with more flexibility is to add an 'order' 
parameter to getFirstChild and have the server compute the first child 
according to the requested order (creation time, update time, lexicographical 
order). This saves bandwidth at the expense of increased server load, although 
servers can be implemented to spend memory on pre-computing commonly requested 
orders. I am only in favour of this approach if servers maintain a 
data-structure for every possible order, and then the memory implications need 
careful consideration.


> Add getFirstChild API
> ---------------------
>
>                 Key: ZOOKEEPER-423
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423
>             Project: Zookeeper
>          Issue Type: New Feature
>          Components: contrib-bindings, documentation, java client, server
>            Reporter: Henry Robinson
>
> When building the distributed queue for my tutorial blog post, it was pointed 
> out to me that there's a serious inefficiency here. 
> Informally, the items in the queue are created as sequential nodes. For a 
> 'dequeue' call, all items are retrieved and sorted by name by the client in 
> order to find the name of the next item to try and take. This costs O( n ) 
> bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this 
> doesn't scale very well. 
> If the servers were able to maintain a data structure that allowed them to 
> efficiently retrieve the children of a node in order of the zxid that created 
> them this would make successful dequeue operations O( 1 ) at the cost of O( n 
> ) memory on the server (to maintain, e.g. a singly-linked list as a queue). 
> This is a win if it is generally true that clients only want the first child 
> in creation order, rather than the whole set. 
> We could expose this to the client via this API: getFirstChild(handle, path, 
> name_buffer, watcher) which would have much the same semantics as 
> getChildren, but only return one znode name. 
> Sequential nodes would still allow the ordering of znodes to be made 
> explicitly available to the client in one RPC should it need it. Although: 
> since this ordering would now be available cheaply for every set of children, 
> it's not completely clear that there would be that many use cases left for 
> sequential nodes if this API was augmented with a getChildrenInCreationOrder 
> call. However, that's for a different discussion. 
> A halfway-house alternative with more flexibility is to add an 'order' 
> parameter to getFirstChild and have the server compute the first child 
> according to the requested order (creation time, update time, lexicographical 
> order). This saves bandwidth at the expense of increased server load, 
> although servers can be implemented to spend memory on pre-computing commonly 
> requested orders. I am only in favour of this approach if servers maintain a 
> data-structure for every possible order, and then the memory implications 
> need careful consideration.
> [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to