Hi. I need help deciding whether I should use a JCR implementation, specifically Jackrabbit or Jackrabbit Oak. I've setup and configured my own Apache Archiva instance, which is how I first became familiar Jackrabbit. Learning from that experience, I keep thinking that Jackrabbit may fit my needs.
I'm solely building a Java EE 6 web application, whose target platform will be a clustered (2 JVM's) WAS v8.0 environment, that will read my company's organizational hierarchy, hosted on Microsoft MySites (Sharepoint), into a data store that will be exposed to clients using JAX-RS resources that return JSON. Building this hierarchy tree takes quite some time (let's estimate 30-60 minutes), even with employing queries that return groups of people and multi-threading. The number of employees is currently around 40,000 - 60,000. I start from the CEO at the top and move my way down the hierarchy in groups of tiers. The data will consist mostly of Strings and 1 optional image, depending whether the employee has an image. I'm currently storing my data in a Map and using Lucene to index the data as a POC. As you can imagine, the data returned by my JAX-RS resource is lightning fast. Since I'm still in the POC phase, I'm currently running my app on my local, single WAS v8.0. However, I have some problems, as I'm sure you already are realizing: 1.) I need the data to persist through app redeployments and restarts. 2.) Data refreshing will be troublesome. With the current POC, I would need to create a 2nd in-memory Map to replace the first. This scenario might result in a user experiencing a service interruption when replacing the first Map with the 2nd Map. Jackrabbit seems like the solution to these issues due to Jackrabbit's ability to store data, to disk or RDBMS, and MVCC ability. I should point out that I will have an Oracle 12c db available if I need it. If I choose Jackrabbit, I'm aiming to achieve JAX-RS data retrieval speeds comparable to my in-memory, Lucene-indexed Map implementation. My end users will be business folk using iPhones, iPads, and Android devices. They want speed. I have some questions: 1.) Does JCR fit my needs? I think that it fits... 2.) If so, which PersistenceManager & FileSystem configuration do you recommend for my scenario? 3.) Every time I need to refresh the employee hierarchy, I will need to read the entire employee hierarchy from MySites again. This data refresh could happen 1-5 times a day, depending on whether we need employee data refreshed for different regions around the world. I need to account for top-level executives, including the CEO, to change. In the hypothetical situation where the CEO changes, and therefore the topmost node in the Jackrabbit node system, will this result in a huge set of revisions, numbering in the tens of thousands? 4.) If so, can Jackrabbit handle such a large set of revisions? 5.) If so, can Jackrabbit handle such a large set of revisions relatively quickly? 6.) I'll be converting the Atom or JSON responses from MySites into JAXB POJOS. I prefer to reuse my POJO's for different purposes (JAX-RS, JPA, CDI, etc.). I know that Jackrabbit has OCM, but I haven't been able to verify that the OCM subproject works with Jackrabbit Oak. I see that in your GitHub, the pom.xml in the trunk lists Jackrabbit 2.4.1 and no mention of Jackrabbit Oak. Does the OCM subproject work with Jackrabbit Oak? I know that I've written a lot, and I apologize for anyone having to have read through it all. This application has been challenging, and I felt that I needed to specify all this info to fully understand my needs. The journey has been long and arduous figuring out integration hurdles and the best architectural choices (yay for WAS v8.0 ;-P ). I'm sorely in need of resolving these JCR questions. Any help would be very much appreciated. - Chris Harris
