Questions about Lucene indexes

Dennis van der Laan Mon, 19 Sep 2022 06:07:44 -0700

Hi all,

Question 1: is there a way to quickly/efficiently optimize the lucene
indexes of Jackrabbit?
Question 2: is there a way to have the lucene indexes shared amongst
cluster nodes, e.g. in a database?


Some context:
We've been using Jackrabbit for years now as the content repository of our
university websites (both public and internal). We use a cluster of 12 JCR
nodes (fysically on 6 servers). Our content is about 50Gb and consists of
XML-files (the editorial content), PDFs and images. There are about 450.000
documents (200.000 XML-files).
There are almost 1000 editors managing the content, although all employees
have some sort of editorial rights on part of the content (their profile
pages).
Over time, queries are getting notably slow and we have to 're-index' the
content. We shut down a cluster node, throw away all lucene index files and
restart the cluster node. All content gets downloaded from the database and
re-indexed. This process takes about 5 hours per node.
I was wondering if other users recognize this process and maybe have found
a better solution. I was also wondering if there is a better way to store
the lucene indexes, not on a local file system, but on a shared storage
like a database, so multiple cluster nodes can use the same index. This way
it would be much easier to add another cluster node. That process also
takes 5 hours in our current setup.

Thanks for any advice!
Kind regards,
Dennis van der Laan

-- 
D.G. van der Laan, MSc
Sr. Software Engineer, team Content Management System & Online Development
Center for Information Technology
+31 (0)50 363 9273

Questions about Lucene indexes

Reply via email to