On Mon, 29 Nov 2010 03:07 -0800, "stockii" <st...@shopgate.com> wrote:
> 
> Hello.
> 
> i have ~37 Million Docs that i want to index. 
> 
> when i starte a full-import i importing only every 2 Million Docs,
> because
> of better controll over solr and space/heap ....
> 
> so when i import 2 million docs and solr start the commit and the
> optimize
> my used disc-space jumps into the sky. reacten: solr restart and space
> the
> used space goes down.
> 
> why is using solr so many space ?  
> 
> can i optimize that  ? 

What do you mean "into the sky"? What percentage increase are you
seeing?

I'd expect it to double at least. I've heard it suggested that you
should have three times the usual space available for an optimise.

Remember, when your index is optimising, you'll want to keep the
original index online and available for searches, so you'll have at
least two copies of your index on disk during an optimise.

Also, it is my understanding that if you commit infrequently, you won't
need to optimise immediately. There's nothing to stop you importing your
entire corpus, then doing a single commit. That will leave you with only
one segment (or at most two - one that existed before and was empty, and
one containing all of your documents). The net result being you don't
need to optimise at that point.

Note - I'm no solr guru, so I could be wrong with some of the above -
I'm happy to be corrected.

Upayavira

Reply via email to