Hi All, We have a cluster of 4 servers for the application and Just one server for Solr. We have just about 2 million docs to index and we never bothered to make the solr environment clustered as Solr was delivering performance with the current setup itself. Offlate we just discovered a problem and I am not sure what would be the right way to go about this.
We have a cron that runs from the application that does a nightly index of data added from another enterprise application. The index job, indexes all courses, be it already indexed or not, and re indexes them. We observed that the job was starting up on all 4 servers at about the same time. All 4 servers point to the same Solr box and the same data is apparently added to the solr box 4 times. There is an update command for every 10,000 data fetched from the database and an commit at the end of the full job. The surprising thing that I noticed was that even though there is a primary key defined in the solr schema, the size of the data(folder) seems to incrementaly increase and is causing the solr server to run out of disk space. I have recently upgraded to the 1.3 version about a month back and I guess the problems might be something that is occuring after that update. The index size of a about millions docs on a clustered dev used to be about 520 megs and is about that much the first time index all the courses. The current size of the same number docs (got from stats page) is 6.5 gigs. Am not sure what has changed and if I there any config change that I could use. The write lock is disabled on dev with lock-type = single. I am not sure if this matters. -Sundar _________________________________________________________________ Searching for the best deals on travel? Visit MSN Travel. http://in.msn.com/coxandkings