Try using LuSql to create the index. It is 4-10 times faster on a multicore machine, and can run in 1/20th the heap size Solr needs. See slides 22-25 in this presentation comparing Solr DIH with LuSql: http://code4lib.org/files/glen_newton_LuSql.pdf
LuSql: http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql Disclosure: I am the author of LuSql. Glen Newton http://zzzoot.blogspot.com/ 2009/7/13 Gurjot Singh <gurjot...@gmail.com>: > Hi, > We have a solr index of size 626 MB and number of douments indexed are > 141810. We have configured index based spellchecker with buildOnCommit > option set to true. Spellcheck index is of size 8.67 MB. > > We use data import handler to create the index from scratch and also to > update the index periodically. We have created the job to run full import > once every week and the delta import after every 20 mins. The full import > takes about 38 mins to complete and the delta import takes about 12 mins to > complete. The index also serves the search queries (even at the time the > delta import is running). The number of documents that are changed during > every delta import are on an average 25 to 30. > > Is there a way to reduce the amount of time delta import takes to update the > index. > The system specs are > MS Windows Server 2003 R2 > Standard x64 Edition > 8 GB RAM. > Solr is set up on Tomcat 6.0 > > The CPU utilization of the tomcat.exe at the time of delta import is 60%. > > In the data-config.xml file there are 6 root entities for 6 database tables > under the <Document> element. The first root entity gets the rows from > table1, the 2nd root entity gets the rows from table2 ...so on. The root > entities have several child entities to get the fields from associated > tables. > > The mergeFactor is set to 10 and ramBufferSizeMB is set to 32. The following > is the cache setting > > <filterCache class="solr.LRUCache" size="16384" initialSize="4096" > autowarmCount="4096"/> > <queryResultCache class="solr.LRUCache" size="16384" initialSize="4096" > autowarmCount="4096"/> > <documentCache class="solr.LRUCache" size="16384" initialSize="16384" > autowarmCount="0"/> > <enableLazyFieldLoading>true</enableLazyFieldLoading> > > Is it advisable to use master slave configuration. Does the index size of > 626 MB validate the change from existing single solr core (on which delta > import is done after every 20 mins and also serves search queries) to master > slave configuration keeping into consideration that the index size will keep > on increasing over time. > > Is there any other way to improve the indexing time. > > Thanks, > Gurjot > > > > ** > -- -