Have you tried performing an "optimize"? Solr doesn't seem to fully integrate all updates into a single index until an optimize is performed.
Jason On Wed, Sep 10, 2008 at 1:05 PM, sundar shankar <[EMAIL PROTECTED]>wrote: > Hi All, > We have a cluster of 4 servers for the application and Just one > server for Solr. We have just about 2 million docs to index and we never > bothered to make the solr environment clustered as Solr was delivering > performance with the current setup itself. Offlate we just discovered a > problem and I am not sure what would be the right way to go about this. > > We have a cron that runs from the application that does a nightly index of > data added from another enterprise application. The index job, indexes all > courses, be it already indexed or not, and re indexes them. We observed that > the job was starting up on all 4 servers at about the same time. All 4 > servers point to the same Solr box and the same data is apparently added to > the solr box 4 times. There is an update command for every 10,000 data > fetched from the database and an commit at the end of the full job. > > The surprising thing that I noticed was that even though there is a primary > key defined in the solr schema, the size of the data(folder) seems to > incrementaly increase and is causing the solr server to run out of disk > space. I have recently upgraded to the 1.3 version about a month back and I > guess the problems might be something that is occuring after that update. > > The index size of a about millions docs on a clustered dev used to be about > 520 megs and is about that much the first time index all the courses. The > current size of the same number docs (got from stats page) is 6.5 gigs. > > Am not sure what has changed and if I there any config change that I could > use. The write lock is disabled on dev with lock-type = single. I am not > sure if this matters. > > -Sundar > > _________________________________________________________________ > Searching for the best deals on travel? Visit MSN Travel. > http://in.msn.com/coxandkings > -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http://www.stylefeeder.com/ Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/