On Sat, Jul 3, 2010 at 1:10 PM, Lance Norskog <goks...@gmail.com> wrote: > You don't need to optimize, only commit.
OK, thanks for the tip, Lance. I thought the "too many open files" problem was because I wasn't optimizing/merging frequently enough. My understanding of your suggestion is that commit also does merging, and since I am only building the index, not querying or updating it, I don't need to optimize. > This means that the JVM spends 98% of its time doing garbage > collection. This means there is not enough memory. I'll increase the memory to 4G, decrease the documentCache to 5 and try again. > I made a mistake - the bug in Lucene is not about PDFs - it happens > with every field in every document you index in any way- so doing this > in Tika outside Solr does not help. The only trick I can think of is > to alternate between indexing large and small documents. This way the > bug does not need memory for two giant documents in a row. I've checked out and built solr from branch_3x with the tika-0.8-SNAPSHOT patch. (Earlier I was having trouble with Tika crashing too frequently.) I've confirmed that LUCENE-2387 is fixed in this branch so hopefully I won't run into that this time. > Also, do not query the indexer at all. If you must, don't do sorted or > faceting requests. These eat up a lot of memory that is only freed > with the next commit (index reload). Good to know, though I have not been querying the index and definitely haven't ventured into faceted requests yet. The advice is much appreciated, Jim