It's probably a good idea to optimize. How are you re-indexing anyway? DIH?
custom code? post.jar?

Manual optimizing is just issuing the appropriate curl command, see:
http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22

Best
Erick

On Fri, Nov 12, 2010 at 12:13 PM, Claudio Devecchi <cdevec...@gmail.com>wrote:

> Hi Tom, thanks for your explanation,
>
> Do you recommend the index continues this way? Or can I configure it to
> make
> optmize automatically?
>
> tks
>
> On Fri, Nov 12, 2010 at 2:39 PM, Burton-West, Tom <tburt...@umich.edu
> >wrote:
>
> > Hi Claudio,
> >
> > What's happening when you re-index the documents is that Solr/Lucene
> > implements an update as a delete plus a new index.  Because of the nature
> of
> > inverted indexes, deleting documents requires a rewrite of the entire
> index.
> > In order to avoid rewriting the entire index each time one document is
> > deleted, deletes are implemented as a list of deleted  internal lucene
> ids.
> > Documents aren't actually removed from the indexes until the index
> segment
> > is merged or an optimize occurs.
> >
> > maxDoc's is the total number of documents indexed without taking into
> > consideration that some of them are marked as deleted
> > numDocs is the actual number of undeleted documents
> >
> > If you run an optimize the index will be rewritten, the index size will
> go
> > down  and numDocs will equal maxDocs
> >
> > Tom Burton-West
> >
> > -----Original Message-----
> > From: Claudio Devecchi [mailto:cdevec...@gmail.com]
> > Sent: Friday, November 12, 2010 10:50 AM
> > To: Lista Solr
> > Subject: Doubt about index size
> >
> > Hi everybody,
> >
> > I'm doing some indexing testing on solr 1.4.1 and I'm not understanding
> one
> > thing, let me try to explain.
> >
> > I have 1.2 million xml files and I'm indexing then, when I do it for
> first
> > time my index size is around 3 GB and in my statistics on
> > http://localhost:8983/solr/admin/stats.jsp I have two entries that is:
> >
> > numDocs : 1120171
> > maxDoc : 1120171
> >
> > Until here is all right, but if I make a index update reindexing all the
> > same 1120171 documents I have the stats bellow:
> >
> > numDocs : 1120171
> > maxDoc : 2240342
> >
> > ... and my index size goes around 6GB.
> >
> > Why this happen? What happens on index size if I have the same number of
> > searcheable docs?
> >
> > Somebody knows?
> >
> > Tks
> >
>
>
>
> --
> Claudio Devecchi
> flickr.com/cdevecchi
>

Reply via email to