Oh I see. Well I am looking into sharding for both reasons. Actually, my use case is with geospatial data (OSM and more) which means I can distribute the data to servers in the relevant regions.
I am thinking that I may able to get away with clustering multiple standalone mongodb instances by "sharding" the data myself. Luckily my use case is rather simple, and I am able to partition the dataset easily. I can implement some form of service discovery to find and connect to the various mongodb instances in the cluster. However, with this setup, I am confused how I would have a single OAK instance connect to the various mongodb instances. Would I need to create a new Repository instance for each? Is this the right approach? Path based sharding is currently not implemented. Some initial work is > done in OAK-3401/OAK-3426 but its still not part of trunk. > Are you looking for sharding to scale out writes or for geo distributed > setups? > Chetan Mehrotra On Sat, Sep 9, 2017 at 6:24 PM, Jon McPherson <[email protected]> wrote: > I am struggling to find enough documentation and examples for constructing > and using Jackrabbit OAK in a clustered environment through sharding node > stores by path. I know this is possible because there are references in a > few places but with very little information. > > Take a look at slide 17 in this PDF which lists the various sharding > strategies.http://events.linuxfoundation.org/sites/ > events/files/slides/the%20architecture%20of%20Oak.pdf > > My use case is that I need to have several remote servers all running the > same Jackrabbit OAK application which uses the DocumentNodeStore backed by > MongoDB for the node and blob storage. What I ultimately want is to shard > (or partition) portions of my data across these remote servers organized by > different paths in the overall node structure. > > For example: > > *Server (A)* > Is responsible for storing content at /a/* > > *Server (B)* > Is responsible for storing content at /b/* > > If Server (A) wants to read or write content at /b/*, it can access nodes > at that path using the normal JCR or OAK API's which should completely > abstract the user from the network details and the connection to the Server > (B) MongoDB. > > Is there any solid documentation relating to this use case? If not, what > is the best way to go about learning this? I can spend the whole day > wandering through the OAK source code, but documentation would be much > preferred. >
