Hi,
I have several applications where contents does not have ids or
immutable names when storing them into JCR, therefore we stored
contents below a common root node in a flatten style with same
name sibling feature.
When we have the same type of content in a application we use a custom
hash for cutting the id/name with the better balancing possible (i.e.
for content identified by a lastname we use the last 3 characters,
ex: smith => h/t/i/smith).
Because we can have different type of content and ids are not available
or just different, i need to implement an efficient but simple hash
(no complex balanced tree) for storing these nodes and not having
problem with Jackrabbit design (max 10k subnodes).
A colleague of mine found an algorithm using a counter for creating
a generic hash:
- start a counter from 0
- increment for each new content in an atomic fashion
- convert the counter value to a string with zero-padded and a length
multiple of 3
- reverse the string (for not using the same parent node and better
balancing)
- cut the string into 3 length segments
- create or use an existing node holder with each segment
- the node of the content will be stored below the last node holder
Clearly, if we create 100 nodes (and use 2 length segment) we would have:
root/00/ (node type: contentHolder)
root/00/entry (node type: myContent)
root/10/ (node type: contentHolder)
root/10/entry (node type: myContent)
...
root/90/ (node type: contentHolder)
root/90/content (node type: myContent)
root/01/ (node type: contentHolder)
root/01/content (node type: myContent)
...
...
root/99/ (node type: contentHolder)
root/99/content (node type: myContent)
If we create 2 more nodes (#100 => 0100 => '00'10' and
#101 => 0101 => '10'10') we would have:
root/00/ (node type: contentHolder)
root/00/entry (node type: myContent)
root/00/10 (node type: contentHolder)
root/00/10/entry (node type: myContent)
root/10/ (node type: contentHolder)
root/10/entry (node type: myContent)
root/10/00 (node type: contentHolder)
root/10/00/entry (node type: myContent)
...
root/90/ (node type: contentHolder)
root/90/content (node type: myContent)
root/01/ (node type: contentHolder)
root/01/content (node type: myContent)
...
...
root/99/ (node type: contentHolder)
root/99/content (node type: myContent)
Note that a different parent node is used for storing these 2 new nodes.
The more contents are created, the deeper they are stored.
With 3 length segment, there is maximum of 1000+1 child nodes per node.
This works well but:
- storing the counter in the repository with a synchronized does not work
in clustering, there is no way to do an atomic increment therefore we use
a database in this case
- if we delete a content this create holes in the tree (holder without entry)
- we can manage to store unused holder for reusing purpose but we need to store
them in a efficient and concurrently safe manner (like svnmerge properties)
So, here is my solution but i am very interested if you have the same kind of
problem and use a better algorithm.
--
Sébastien Launay