Can you parallelize this? I don't know that the DIH can handle it, but having multiple threads sending docs to Solr is the best performance wise, so maybe you need to look at alternatives to pulling with DIH and instead use a client to push into Solr.

On May 22, 2009, at 3:42 AM, Jianbin Dai wrote:


about 2.8 m total docs were created. only the first run finishes. In my 2nd try, it hangs there forever at the end of indexing, (I guess right before commit), with cpu usage of 100%. Total 5G (2050) index files are created. Now I have two problems:
1. why it hangs there and failed?
2. how can i speed up the indexing?


Here is my solrconfig.xml

   <useCompoundFile>false</useCompoundFile>
   <ramBufferSizeMB>3000</ramBufferSizeMB>
   <mergeFactor>1000</mergeFactor>
   <maxMergeDocs>2147483647</maxMergeDocs>
   <maxFieldLength>10000</maxFieldLength>
   <unlockOnStartup>false</unlockOnStartup>




--- On Thu, 5/21/09, Noble Paul നോബിള്‍ नो ब्ळ् <noble.p...@corp.aol.com> wrote:

From: Noble Paul നോബിള്‍ नोब्ळ् <noble.p...@corp.aol.com>
Subject: Re: How to index large set data
To: solr-user@lucene.apache.org
Date: Thursday, May 21, 2009, 10:39 PM
what is the total no:of docs created
?  I guess it may not be memory
bound. indexing is mostly amn IO bound operation. You may
be able to
get a better perf if a SSD is used (solid state disk)

On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai <djian...@yahoo.com>
wrote:

Hi Paul,

Thank you so much for answering my questions. It
really helped.
After some adjustment, basically setting mergeFactor
to 1000 from the default value of 10, I can finished the
whole job in 2.5 hours. I checked that during running time,
only around 18% of memory is being used, and VIRT is always
1418m. I am thinking it may be restricted by JVM memory
setting. But I run the data import command through web,
i.e.,

http://<host>:<port>/solr/dataimport?command=full-import,
how can I set the memory allocation for JVM?
Thanks again!

JB

--- On Thu, 5/21/09, Noble Paul നോബിള്‍
 नोब्ळ् <noble.p...@corp.aol.com>
wrote:

From: Noble Paul നോബിള്‍
 नोब्ळ् <noble.p...@corp.aol.com>
Subject: Re: How to index large set data
To: solr-user@lucene.apache.org
Date: Thursday, May 21, 2009, 9:57 PM
check the status page of DIH and see
if it is working properly. and
if, yes what is the rate of indexing

On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai
<djian...@yahoo.com>
wrote:

Hi,

I have about 45GB xml files to be indexed. I
am using
DataImportHandler. I started the full import 4
hours ago,
and it's still running....
My computer has 4GB memory. Any suggestion on
the
solutions?
Thanks!

JB








--

-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com









--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com





--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to