Question on how index works - runs out of disk space!

sundar shankar Wed, 10 Sep 2008 10:06:18 -0700

Hi All,
          We have a cluster of 4 servers for the application and Just one 
server for Solr. We have just about 2 million docs to index and we never 
bothered to make the solr environment clustered as Solr was delivering 
performance with the current setup itself. Offlate we just discovered a problem 
and I am not sure what would be the right way to go about this.


We have a cron that runs from the application that does a nightly index of data 
added from another enterprise application. The index job, indexes all courses, 
be it already indexed or not, and re indexes them. We observed that the job was 
starting up on all 4 servers at about the same time. All 4 servers point to the 
same Solr box and the same data is apparently added to the solr box 4 times. 
There is an update command for every 10,000 data fetched from the database and 
an commit at the end of the full job.

The surprising thing that I noticed was that even though there is a primary key 
defined in the solr schema, the size of the data(folder) seems to incrementaly 
increase and is causing the solr server to run out of disk space. I have 
recently upgraded to the 1.3 version about a month back and I guess the 
problems might be something that is occuring after that update. 

The index size of a about millions docs on a clustered dev used to be about 520 
megs and is about that much the first time index all the courses. The current 
size of the same number docs (got from stats page) is 6.5 gigs. 

Am not sure what has changed and if I there any config change that I could use. 
The write lock is disabled on dev with lock-type = single. I am not sure if 
this matters. 

-Sundar

_________________________________________________________________
Searching for the best deals on travel? Visit MSN Travel.
http://in.msn.com/coxandkings

Question on how index works - runs out of disk space!

Reply via email to