Try using LuSql to create the index. It is 4-10 times faster on a
multicore machine, and can run in 1/20th the heap size Solr needs.
See slides 22-25 in this presentation comparing Solr DIH with LuSql:
 http://code4lib.org/files/glen_newton_LuSql.pdf

LuSql: http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql

Disclosure: I am the author of LuSql.

Glen Newton
http://zzzoot.blogspot.com/

2009/7/13 Gurjot Singh <gurjot...@gmail.com>:
> Hi,
> We have a solr index of size 626 MB and number of douments indexed are
> 141810. We have configured index based spellchecker with buildOnCommit
> option set to true. Spellcheck index is of size 8.67 MB.
>
> We use data import handler to create the index from scratch and also to
> update the index periodically. We have created the job to run full import
> once every week and the delta import after every 20 mins. The full import
> takes about 38 mins to complete and the delta import takes about 12 mins to
> complete. The index also serves the search queries (even at the time the
> delta import is running). The number of documents that are changed during
> every delta import are on an average 25 to 30.
>
> Is there a way to reduce the amount of time delta import takes to update the
> index.
> The system specs are
> MS Windows Server 2003 R2
> Standard x64 Edition
> 8 GB RAM.
> Solr is set up on Tomcat 6.0
>
> The CPU utilization of the tomcat.exe at the time of delta import is 60%.
>
> In the data-config.xml file there are 6 root entities for 6 database tables
> under the <Document> element. The first root entity gets the rows from
> table1, the 2nd root entity gets the rows from table2 ...so on. The root
> entities have several child entities to get the fields from associated
> tables.
>
> The mergeFactor is set to 10 and ramBufferSizeMB is set to 32. The following
> is the cache setting
>
> <filterCache class="solr.LRUCache" size="16384" initialSize="4096"
> autowarmCount="4096"/>
> <queryResultCache class="solr.LRUCache" size="16384" initialSize="4096"
> autowarmCount="4096"/>
> <documentCache class="solr.LRUCache" size="16384" initialSize="16384"
> autowarmCount="0"/>
> <enableLazyFieldLoading>true</enableLazyFieldLoading>
>
> Is it advisable to use master slave configuration. Does the index size of
> 626 MB validate the change from existing single solr core (on which delta
> import is done after every 20 mins and also serves search queries) to master
> slave configuration keeping into consideration that the index size will keep
> on increasing over time.
>
> Is there any other way to improve the indexing time.
>
> Thanks,
> Gurjot
>
>
>
> **
>



-- 

-

Reply via email to