Re: Supporting "resumable" operations on a large tree

2017-02-24 Thread Thomas Mueller
Hi, >So we can implement a "paginated tree traversal" Yes, I thinks that's a first step, something for oak-core which can be re-used in multiple places. It might make sense to also create a JCR version, for other use cases. Regards, Thomas

Re: Supporting "resumable" operations on a large tree

2017-02-24 Thread Chetan Mehrotra
Hi Thomas, On Fri, Feb 24, 2017 at 1:09 PM, Thomas Mueller wrote: > 9) Sorting of path is needed, so that the repository can be processed bit > by bit by bit. For that, the following logic is used, recursively: read at > most 1000 child nodes. If there are more than 1000, then

Re: Supporting "resumable" operations on a large tree

2017-02-23 Thread Thomas Mueller
Hi, My suggestion is to _not_ support "resumable" operations on a large tree, but instead don't use large operations. But I wouldn't call my solution "sharding", but more "bit-by-bit reindexing". Some more details: For indexing (specially synchronous property indexes) I suggest to do the

Re: Supporting "resumable" operations on a large tree

2017-02-23 Thread Vikas Saurabh
Hi, A quick side-question related to what Stefan mentioned earlier: > A stable traversal order at a given revision + node seems like a prerequisite to me. Javadoc of NodeState#getChildNodeEntries says: " Multiple iterations are guaranteed to return the child nodes in the same order, but the

Re: Supporting "resumable" operations on a large tree

2017-02-21 Thread Thomas Mueller
Hi, For re-indexing, there are two problems actually: * Indexing can take multiple days, so resume would be nice * For synchronous indexes, indexing create a large commit, which is problematic (specially for MongoDB) To solve both problems ("kill two birds with one stone"), we could instead try

Re: Supporting "resumable" operations on a large tree

2017-02-20 Thread Marcel Reutegger
On 20/02/17 15:27, Alex Parvulescu wrote: What about walking the revision history in smaller chunks? Given a repository history of revisions: r0, r1, , r100, the indexer would now diff [r0, r100] which can be resource intensive. What if it diffs by a window of size 10: [r0, r9] (mark), [r10,

Re: Supporting "resumable" operations on a large tree

2017-02-20 Thread Davide Giannella
On 20/02/2017 13:44, Marcel Reutegger wrote: > > Instead of the revision, the implementation can also rely on a > checkpoint that marks the snapshot of the repository as the basis of > the large-tree-operation. I was thinking the same. We may rely on checkpoints and then store additional info the