Indexer behaves differently while starting the repository?

Cech. Ulrich Wed, 16 Mar 2011 03:14:37 -0700

Hello,

I have some "funny" problem with the Jackrabbit-Indexer mechanism. Are store 
text files with a file size between 2 und 170MB. The Xmx is set to 850, and the 
indexer works without problem when storing the stream in the jackrabbit 
datastore (FileDataStore). Till here is everything ok.
But if I delete the workspace-index directory to let Jackrabbit restore it when 
starting the next time, the indexer starts, works some files and then creates 
an java.lang.OutOfMemoryError: Java heap space.


Can someone tell me, where the difference is between "indexing while storing" 
and "(re)indexing while startup the repository"?

Thank you very much for any hint,
Best regards,
Ulrich

I appended the stacktrace here:

2011-03-16 11:04:36,916 WARN : [LazyTextExtractorField] Failed to extract text 
from a binary property
java.lang.OutOfMemoryError: Java heap space
            at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
            at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
            at java.lang.StringBuilder.append(StringBuilder.java:190)
            at 
org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.characters(LazyTextExtractorField.java:191)
            at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
            at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)
            at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
            at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
            at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
            at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
            at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)
            at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)
            at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)
            at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)
            at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:261)
            at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:132)
            at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
            at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
            at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
            at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
            at 
org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:192)
            at 
org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:174)
            at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
            at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
            at java.util.concurrent.FutureTask.run(FutureTask.java:123)
            at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:65)
            at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:168)
            at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
            at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
            at java.lang.Thread.run(Thread.java:595)

Indexer behaves differently while starting the repository?

Reply via email to