I'll make a very wild guess and say that it's possible for this to happen if your dates are very granular (down to milliseconds). All of a sudden you probably got 500,000 new terms there. Wild guess.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Phillip Farber <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, November 6, 2008 11:08:18 AM > Subject: Re: Huge increase in index size adding just 2 fields > > May I ask again whether a index size increase from 120GB to 166GB is expected > simply by adding a stored date and a stored repeating string field if length > perhaps 20 and roughly 2 values per doc for 500,000 on average? The doc is a > large body of OCR and the position index dominates due to the large number of > terms. > > Thanks, > > Phil > > > Phillip Farber wrote: > > > > Hi, > > > > We're indexing a lot of dirty OCR. So the index is really huge due to the > > size > of the position file. We still get ok response time though with a median of > 100ms. Phrase queries are a different matter obviously. But we're seeing > some > really large increases in index size as we add a couple of fields that do not > make sense. > > > > Our 500,000 document index is 120G. It's simple schema is: > > > > > > > > > > > > > required="true"/> > > > > We added the following 2 fields to the above schema as follows: > > > > > > > multiValued="true"/> > > > > where the "hlb" field consists of not more than 3-4 strings such as "Social > Sicence"/ > > > > Our 500,000 document index size increased to 166G! This seems completely > wrong. Looking at the directory listings for each case it appears every one > of > the files grew in size. > > > > How can this be? > > > > Phil > > > > === > > > > 120G index: > > > > -rw-r--r-- 1 tomcat admin 81023261 Sep 24 06:00 _fj.fdt > > -rw-r--r-- 1 tomcat admin 4000072 Sep 24 06:00 _fj.fdx > > -rw-r--r-- 1 tomcat admin 33 Sep 24 06:00 _fj.fnm > > -rw-r--r-- 1 tomcat admin 14069125169 Sep 24 06:16 _fj.frq > > -rw-r--r-- 1 tomcat admin 1500031 Sep 24 06:16 _fj.nrm > > -rw-r--r-- 1 tomcat admin 109247382360 Sep 24 08:25 _fj.prx > > -rw-r--r-- 1 tomcat admin 58677668 Sep 24 08:25 _fj.tii > > -rw-r--r-- 1 tomcat admin 4319853217 Sep 24 08:32 _fj.tis > > -rw-r--r-- 1 tomcat admin 42 Sep 24 08:32 segments_fo > > -rw-r--r-- 1 tomcat admin 20 Sep 24 08:32 segments.gen > > > > 166G index (+ 2 fields) > > > > -rw-r--r-- 1 tomcat admin 113530692 Oct 21 10:42 _fh.fdt > > -rw-r--r-- 1 tomcat admin 3960256 Oct 21 10:42 _fh.fdx > > -rw-r--r-- 1 tomcat admin 44 Oct 21 10:42 _fh.fnm > > -rw-r--r-- 1 tomcat admin 15242830112 Oct 21 12:58 _fh.frq > > -rw-r--r-- 1 tomcat admin 1485100 Oct 21 12:58 _fh.nrm > > -rw-r--r-- 1 tomcat admin 117927610810 Oct 21 12:58 _fh.prx > > -rw-r--r-- 1 tomcat admin 72760439 Oct 21 12:58 _fh.tii > > -rw-r--r-- 1 tomcat admin 5337669551 Oct 21 12:58 _fh.tis > > -rw-r--r-- 1 tomcat admin 42 Oct 21 12:58 segments_fk > > -rw-r--r-- 1 tomcat admin 20 Oct 21 12:58 segments.gen