https://issues.apache.org/jira/browse/LUCENE-2387

There is a "memory leak" that causes the last PDF binary file image to
stick around while working on the next binary image. When you commit
after every extraction, you clear up this "memory leak".

This is fixed in trunk and should make it into a 'bug fix' Solr 1.4.1
if such a thing happens.

Lance

On Wed, Jun 9, 2010 at 10:13 AM, Jim Blomo <jim.bl...@pbworks.com> wrote:
> On Fri, Jun 4, 2010 at 3:14 PM, Chris Hostetter
> <hossman_luc...@fucit.org> wrote:
>> : That is still really small for 5MB documents. I think the default solr
>> : document cache is 512 items, so you would need at least 3 GB of memory
>> : if you didn't change that and the cache filled up.
>>
>> that assumes that the extracted text tika extracts from each document is
>> the same size as the original raw files *and* that he's configured that
>> content field to be "stored" ... in practice if you only stored=true the
>
> Most times the extracted text is much smaller, though there are
> occasional zip files that may expand in size (and in an unrelated
> note, multifile zip archives cause tika 0.7 to hang currently).
>
>> fast, 128MB is really, really, really small for a typical Solr instance.
>
> In any case I bumped up the heap to 3G as suggested, which has helped
> stability.  I have found that in practice I need to commit every
> extraction because a crash or error will wipe out all extractions
> after the last commit.
>
>> if you are only seeing one log line per request, then you are just looking
>> at the "request" log ... there should be more logs with messages from all
>> over the code base with various levels of severity -- and using standard
>> java log level controls you can turn these up/down for various components.
>
> Unfortunately, I'm not very familiar with java deploys so I don't know
> where the standard controls are yet.  As a concrete example, I do see
> INFO level logs, but haven't found a way to move up DEBUG level in
> either solr or tomcat.  I was hopeful debug statements would point to
> where extraction/indexing hangs were occurring.  I will keep poking
> around, thanks for the tips.
>
> Jim
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to