Should I use JackRabbit?

Harris, Christopher P Wed, 22 Apr 2015 09:30:03 -0700

Hi.

I need help deciding whether I should use a JCR implementation, specifically 
Jackrabbit or Jackrabbit Oak.  I've setup and configured my own Apache Archiva 
instance, which is how I first became familiar Jackrabbit.  Learning from that 
experience, I keep thinking that Jackrabbit may fit my needs.


I'm solely building a Java EE 6 web application, whose target platform will be 
a clustered (2 JVM's) WAS v8.0 environment, that will read my company's 
organizational hierarchy, hosted on Microsoft MySites (Sharepoint), into a data 
store that will be exposed to clients using JAX-RS resources that return JSON.  
Building this hierarchy tree takes quite some time (let's estimate 30-60 
minutes), even with employing queries that return groups of people and 
multi-threading.  The number of employees is currently around 40,000 - 60,000.  
I start from the CEO at the top and move my way down the hierarchy in groups of 
tiers.

The data will consist mostly of Strings and 1 optional image, depending whether 
the employee has an image.

I'm currently storing my data in a Map and using Lucene to index the data as a 
POC.  As you can imagine, the data returned by my JAX-RS resource is lightning 
fast.  Since I'm still in the POC phase, I'm currently running my app on my 
local, single WAS v8.0.

However, I have some problems, as I'm sure you already are realizing:

1.)    I need the data to persist through app redeployments and restarts.

2.)    Data refreshing will be troublesome.  With the current POC, I would need 
to create a 2nd in-memory Map to replace the first.  This scenario might result 
in a user experiencing a service interruption when replacing the first Map with 
the 2nd Map.

Jackrabbit seems like the solution to these issues due to Jackrabbit's ability 
to store data, to disk or RDBMS, and MVCC ability.

I should point out that I will have an Oracle 12c db available if I need it.

If I choose Jackrabbit, I'm aiming to achieve JAX-RS data retrieval speeds 
comparable to my in-memory, Lucene-indexed Map implementation.  My end users 
will be business folk using iPhones, iPads, and Android devices.  They want 
speed.

I have some questions:

1.)    Does JCR fit my needs?  I think that it fits...

2.)    If so, which PersistenceManager & FileSystem configuration do you 
recommend for my scenario?

3.)    Every time I need to refresh the employee hierarchy, I will need to read 
the entire employee hierarchy from MySites again.  This data refresh could 
happen 1-5 times a day, depending on whether we need employee data refreshed 
for different regions around the world.  I need to account for top-level 
executives, including the CEO, to change.  In the hypothetical situation where 
the CEO changes, and therefore the topmost node in the Jackrabbit node system, 
will this result in a huge set of revisions, numbering in the tens of thousands?

4.)    If so, can Jackrabbit handle such a large set of revisions?

5.)    If so, can Jackrabbit handle such a large set of revisions relatively 
quickly?

6.)    I'll be converting the Atom or JSON responses from MySites into JAXB 
POJOS.  I prefer to reuse my POJO's for different purposes (JAX-RS, JPA, CDI, 
etc.).  I know that Jackrabbit has OCM, but I haven't been able to verify that 
the OCM subproject works with Jackrabbit Oak.  I see that in your GitHub, the 
pom.xml in the trunk lists Jackrabbit 2.4.1 and no mention of Jackrabbit Oak.  
Does the OCM subproject work with Jackrabbit Oak?


I know that I've written a lot, and I apologize for anyone having to have read 
through it all.  This application has been challenging, and I felt that I 
needed to specify all this info to fully understand my needs.  The journey has 
been long and arduous figuring out integration hurdles and the best 
architectural choices (yay for WAS v8.0 ;-P ).  I'm sorely in need of resolving 
these JCR questions.  Any help would be very much appreciated.



-        Chris Harris

Should I use JackRabbit?

Reply via email to