On Sat, Jul 3, 2010 at 1:10 PM, Lance Norskog <goks...@gmail.com> wrote:
> You don't need to optimize, only commit.

OK, thanks for the tip, Lance.  I thought the "too many open files"
problem was because I wasn't optimizing/merging frequently enough.  My
understanding of your suggestion is that commit also does merging, and
since I am only building the index, not querying or updating it, I
don't need to optimize.

> This means that the JVM spends 98% of its time doing garbage
> collection. This means there is not enough memory.

I'll increase the memory to 4G, decrease the documentCache to 5 and try again.

> I made a mistake - the bug in Lucene is not about PDFs - it happens
> with every field in every document you index in any way- so doing this
> in Tika outside Solr does not help. The only trick I can think of is
> to alternate between indexing large and small documents. This way the
> bug does not need memory for two giant documents in a row.

I've checked out and built solr from branch_3x with the
tika-0.8-SNAPSHOT patch.  (Earlier I was having trouble with Tika
crashing too frequently.)  I've confirmed that LUCENE-2387 is fixed in
this branch so hopefully I won't run into that this time.

> Also, do not query the indexer at all. If you must, don't do sorted or
> faceting requests. These eat up a lot of memory that is only freed
> with the next commit (index reload).

Good to know, though I have not been querying the index and definitely
haven't ventured into faceted requests yet.

The advice is much appreciated,

Jim

Reply via email to