Re: Apache Jackrabbit OAK - Sharding DocumentNodeStore across cluster by node path

Jon McPherson Tue, 12 Sep 2017 00:45:31 -0700

Oh I see.
Well I am looking into sharding for both reasons.  Actually, my use case is
with geospatial data (OSM and more) which means I can distribute the data
to servers in the relevant regions.


I am thinking that I may able to get away with clustering multiple
standalone mongodb instances by "sharding" the data myself. Luckily my use
case is rather simple, and I am able to partition the dataset easily. I can
implement some form of service discovery to find and connect to the various
mongodb instances in the cluster. However, with this setup, I am confused
how I would have a single OAK instance connect to the various mongodb
instances. Would I need to create a new Repository instance for each? Is
this the right approach?

Path based sharding is currently not implemented. Some initial work is
> done in OAK-3401/OAK-3426 but its still not part of trunk.
> Are you looking for sharding to scale out writes or for geo distributed
> setups?
> Chetan Mehrotra


On Sat, Sep 9, 2017 at 6:24 PM, Jon McPherson <[email protected]> wrote:

> I am struggling to find enough documentation and examples for constructing
> and using Jackrabbit OAK in a clustered environment through sharding node
> stores by path. I know this is possible because there are references in a
> few places but with very little information.
>
> Take a look at slide 17 in this PDF which lists the various sharding
> strategies.http://events.linuxfoundation.org/sites/
> events/files/slides/the%20architecture%20of%20Oak.pdf
>
> My use case is that I need to have several remote servers all running the
> same Jackrabbit OAK application which uses the DocumentNodeStore backed by
> MongoDB for the node and blob storage. What I ultimately want is to shard
> (or partition) portions of my data across these remote servers organized by
> different paths in the overall node structure.
>
> For example:
>
> *Server (A)*
> Is responsible for storing content at /a/*
>
> *Server (B)*
> Is responsible for storing content at /b/*
>
> If Server (A) wants to read or write content at /b/*, it can access nodes
> at that path using the normal JCR or OAK API's which should completely
> abstract the user from the network details and the connection to the Server
> (B) MongoDB.
>
> Is there any solid documentation relating to this use case? If not, what
> is the best way to go about learning this? I can spend the whole day
> wandering through the OAK source code, but documentation would be much
> preferred.
>

Re: Apache Jackrabbit OAK - Sharding DocumentNodeStore across cluster by node path

Reply via email to