Am 19.09.2022 um 15:07 schrieb Dennis van der Laan:
Hi all,
Question 1: is there a way to quickly/efficiently optimize the lucene
indexes of Jackrabbit?
Question 2: is there a way to have the lucene indexes shared amongst
cluster nodes, e.g. in a database?
Some context:
We've been using Jackrabbit for years now as the content repository of our
university websites (both public and internal). We use a cluster of 12 JCR
nodes (fysically on 6 servers). Our content is about 50Gb and consists of
XML-files (the editorial content), PDFs and images. There are about 450.000
documents (200.000 XML-files).
There are almost 1000 editors managing the content, although all employees
have some sort of editorial rights on part of the content (their profile
pages).
Over time, queries are getting notably slow and we have to 're-index' the
content. We shut down a cluster node, throw away all lucene index files and
restart the cluster node. All content gets downloaded from the database and
re-indexed. This process takes about 5 hours per node.
I was wondering if other users recognize this process and maybe have found
a better solution. I was also wondering if there is a better way to store
the lucene indexes, not on a local file system, but on a shared storage
like a database, so multiple cluster nodes can use the same index. This way
it would be much easier to add another cluster node. That process also
takes 5 hours in our current setup.
Thanks for any advice!
Kind regards,
Dennis van der Laan
Hi Dennis,
development of Jackrabbit more or less stopped a few years ago (mod
bugfixes and dependency updates).
You may want to look at Jackrabbit Oak, which will indeed support
clustering with a single persistence (including index files), when using
the DocumentNodeStore (usually with MongoDB).
Best regards, Julian