An optimize takes lots of cpu and I/O since it has to rewrite your indexes, so 
only do it when necessary.

You can just use curl to send an optimize message to Solr when you are ready.

See:
http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_parameters_as_part_of_the_URL

Tom
-----Original Message-----
From: Claudio Devecchi [mailto:cdevec...@gmail.com] 
Sent: Friday, November 12, 2010 12:13 PM
To: solr-user@lucene.apache.org
Subject: Re: Doubt about index size

Hi Tom, thanks for your explanation,

Do you recommend the index continues this way? Or can I configure it to make
optmize automatically?

tks

On Fri, Nov 12, 2010 at 2:39 PM, Burton-West, Tom <tburt...@umich.edu>wrote:

> Hi Claudio,
>
> What's happening when you re-index the documents is that Solr/Lucene
> implements an update as a delete plus a new index.  Because of the nature of
> inverted indexes, deleting documents requires a rewrite of the entire index.
> In order to avoid rewriting the entire index each time one document is
> deleted, deletes are implemented as a list of deleted  internal lucene ids.
> Documents aren't actually removed from the indexes until the index segment
> is merged or an optimize occurs.
>
> maxDoc's is the total number of documents indexed without taking into
> consideration that some of them are marked as deleted
> numDocs is the actual number of undeleted documents
>
> If you run an optimize the index will be rewritten, the index size will go
> down  and numDocs will equal maxDocs
>
> Tom Burton-West
>
> -----Original Message-----
> From: Claudio Devecchi [mailto:cdevec...@gmail.com]
> Sent: Friday, November 12, 2010 10:50 AM
> To: Lista Solr
> Subject: Doubt about index size
>
> Hi everybody,
>
> I'm doing some indexing testing on solr 1.4.1 and I'm not understanding one
> thing, let me try to explain.
>
> I have 1.2 million xml files and I'm indexing then, when I do it for first
> time my index size is around 3 GB and in my statistics on
> http://localhost:8983/solr/admin/stats.jsp I have two entries that is:
>
> numDocs : 1120171
> maxDoc : 1120171
>
> Until here is all right, but if I make a index update reindexing all the
> same 1120171 documents I have the stats bellow:
>
> numDocs : 1120171
> maxDoc : 2240342
>
> ... and my index size goes around 6GB.
>
> Why this happen? What happens on index size if I have the same number of
> searcheable docs?
>
> Somebody knows?
>
> Tks
>



-- 
Claudio Devecchi
flickr.com/cdevecchi

Reply via email to