index size before and after commit

2009-10-01 Thread Phillip Farber
I am trying to automate a build process that adds documents to 10 shards over 5 machines and need to limit the size of a shard to no more than 200GB because I only have 400GB of disk available to optimize a given shard. Why does the size (du) of an index typically decrease after a commit? I'v

Re: index size before and after commit

2009-10-01 Thread Grant Ingersoll
It may take some time before resources are released and garbage collected, so that may be part of the reason why things hang around and du doesn't report much of a drop. On Oct 1, 2009, at 8:54 AM, Phillip Farber wrote: I am trying to automate a build process that adds documents to 10 shar

Re: index size before and after commit

2009-10-01 Thread Mark Miller
Phillip Farber wrote: > I am trying to automate a build process that adds documents to 10 > shards over 5 machines and need to limit the size of a shard to no > more than 200GB because I only have 400GB of disk available to > optimize a given shard. > > Why does the size (du) of an index typically

Re: index size before and after commit

2009-10-01 Thread Mark Miller
Whoops - they way I have mail come in, not easy to tell if I'm replying to Lucene or Solr list ;) The way Solr works with Searchers and reopen, it shouldn't run into a situation that requires greater than 2x to optimize. I won't guarantee it ;) But based on what I know, it shouldn't happen under n

Re: index size before and after commit

2009-10-01 Thread Walter Underwood
Here is how you need 3X. First, index everything and optimize. Then delete everything and reindex without any merges. You have one full-size index containing only deleted docs, one full- size index containing reindexed docs, and need that much space for a third index. Honestly, disk is che

Re: index size before and after commit

2009-10-01 Thread Mark Miller
Nice one ;) Its not technically a case where optimize requires > 2x though in case the user asking gets confused. Its a case unrelated to optimize that can grow your index. Then you need < 2x for the optimize, since you won't copy the deletes. It also requires that you jump hoops to delete everyth

Re: index size before and after commit

2009-10-01 Thread Mark Miller
bq. and reindex without any merges. Thats actually quite a hoop to jump as well - though if you determined and you have tons of RAM, its somewhat doable. Mark Miller wrote: > Nice one ;) Its not technically a case where optimize requires > 2x > though in case the user asking gets confused. Its a

Re: index size before and after commit

2009-10-01 Thread Walter Underwood
I've now worked on three different search engines and they all have a 3X worst case on space, so I'm familiar with this case. --wunder On Oct 1, 2009, at 7:15 AM, Mark Miller wrote: Nice one ;) Its not technically a case where optimize requires > 2x though in case the user asking gets confuse

Re: index size before and after commit

2009-10-01 Thread Lance Norskog
I've heard there is a new "partial optimize" feature in Lucene, but it is not mentioned in the Solr or Lucene wikis so I cannot advise you how to use it. On a previous project we had a 500GB index for 450m documents. It took 14 hours to optimize. We found that Solr worked well (given enough RAM fo

Re: index size before and after commit

2009-10-01 Thread Lance Norskog
Ha! Searching "partial optimize" on http://www.lucidimagination.com/search , we discover SOLR-603 which gives the 'maxSegments' option to the command. The text does not include the word 'partial'. It's on http://wiki.apache.org/solr/UpdateXmlMessages. The command gives a number of Lucene segments