Thank you, Claus.

I'll try to configure a cluster and see how it works.

Another thing on the process of rebuilding the index, is that my computer's CPU usage was constantly on 16% and access to the HD and memory usage was low too. So my computer's resources weren't used completely. So I executed on debug mode and I saw this message, many times:
"Executor is under load, will schedule 1987 remaining tasks for 50 ms later"

Searching deeply, I found that Jackrabbit creates a text extraction task as a low priority task. The execution of this kind of task is controlled by the value "maxLoadForLowPriorityTasks" in the JackrabbitThreadPool, wich is defined by the value from a system parameter "org.apache.jackrabbit.core.JackrabbitThreadPool.maxLoadForLowPriorityTasks". If this value doesn't exist or it's not between 0 and 100, Jackrabbit uses 75 by default. This value is used to determine if it's possible to execute a low priority task, checking the number of threads that are active in the moment. Using default value, if more than 75% of threads are in use, the task will be scheduled for later.

So I set the parameter "org.apache.jackrabbit.core.JackrabbitThreadPool.maxLoadForLowPriorityTasks" to "0" and Jackrabbit ignores the verification and the process was faster, about 2hours to complete the rebuild. The CPU usage was floating from 50% to 90%, memory was used up to the limit and the HD was accessed more constantly. Maybe it's better to increase the memory allocated before you execute this.

In my scenario, it make sense to set this value to "0", because while the rebuild process is executing, my client can't use the system, so I can use all the resources that I have to finish as soon as possible. After the rebuild process, you should remove the parameter, so Jackrabbit can control the execution of low priority task again.

Maybe this can help someone who have to rebuild the index as soon as possible and don't have a cluster mentioned by Clauss.

Em 23/11/2012 06:29, KÖLL Claus escreveu:
Hi Nelson,

1) Reading the source code, jackrabbit is using LazyTextExtractorField (and 
other classes) to execute the extraction in a separate thread.
Doesn't it do exactly what I want? But, even so I waited 3 hours and the 
repository wasn't initialized and ready to use. Is it normal?
First .. yes this is normal ..
and yes you are right about extraction in a separate thread .. this happens on 
session.save() operation. If you start the repository it will start to re-index 
it if the index is not present.
In that way jackrabbit does not separate between full text indexing and 
"normal" node/property indexing. So the start will take much time
depending on your content.

2)  What I'm planning to do is the best approach? Did anybody make something 
similar?
One way to handle such index recovering is to create a cluster. Let's assume 
you would have 2 cluster members where one is the primary and the other one is 
a hot standby member.
If you have problems with the index on the primary cluster member you could 
copy the index folder from the standby cluster member.
If you like you could re-index the repository on your standby member while the 
primary is running.

greets
claus

Reply via email to