Thank you, Claus.
I'll try to configure a cluster and see how it works.
Another thing on the process of rebuilding the index, is that my
computer's CPU usage was constantly on 16% and access to the HD and
memory usage was low too. So my computer's resources weren't used
completely. So I executed on debug mode and I saw this message, many times:
"Executor is under load, will schedule 1987 remaining tasks for 50 ms later"
Searching deeply, I found that Jackrabbit creates a text extraction task
as a low priority task. The execution of this kind of task is controlled
by the value "maxLoadForLowPriorityTasks" in the JackrabbitThreadPool,
wich is defined by the value from a system parameter
"org.apache.jackrabbit.core.JackrabbitThreadPool.maxLoadForLowPriorityTasks".
If this value doesn't exist or it's not between 0 and 100, Jackrabbit
uses 75 by default. This value is used to determine if it's possible to
execute a low priority task, checking the number of threads that are
active in the moment. Using default value, if more than 75% of threads
are in use, the task will be scheduled for later.
So I set the parameter
"org.apache.jackrabbit.core.JackrabbitThreadPool.maxLoadForLowPriorityTasks"
to "0" and Jackrabbit ignores the verification and the process was
faster, about 2hours to complete the rebuild. The CPU usage was floating
from 50% to 90%, memory was used up to the limit and the HD was accessed
more constantly. Maybe it's better to increase the memory allocated
before you execute this.
In my scenario, it make sense to set this value to "0", because while
the rebuild process is executing, my client can't use the system, so I
can use all the resources that I have to finish as soon as possible.
After the rebuild process, you should remove the parameter, so
Jackrabbit can control the execution of low priority task again.
Maybe this can help someone who have to rebuild the index as soon as
possible and don't have a cluster mentioned by Clauss.
Em 23/11/2012 06:29, KÖLL Claus escreveu:
Hi Nelson,
1) Reading the source code, jackrabbit is using LazyTextExtractorField (and
other classes) to execute the extraction in a separate thread.
Doesn't it do exactly what I want? But, even so I waited 3 hours and the
repository wasn't initialized and ready to use. Is it normal?
First .. yes this is normal ..
and yes you are right about extraction in a separate thread .. this happens on
session.save() operation. If you start the repository it will start to re-index
it if the index is not present.
In that way jackrabbit does not separate between full text indexing and
"normal" node/property indexing. So the start will take much time
depending on your content.
2) What I'm planning to do is the best approach? Did anybody make something
similar?
One way to handle such index recovering is to create a cluster. Let's assume
you would have 2 cluster members where one is the primary and the other one is
a hot standby member.
If you have problems with the index on the primary cluster member you could
copy the index folder from the standby cluster member.
If you like you could re-index the repository on your standby member while the
primary is running.
greets
claus