there is already an issue for writing to Solr in multiple threads SOLR-1089
On Fri, May 22, 2009 at 6:08 PM, Grant Ingersoll <gsing...@apache.org> wrote: > Can you parallelize this? I don't know that the DIH can handle it, but > having multiple threads sending docs to Solr is the best performance wise, > so maybe you need to look at alternatives to pulling with DIH and instead > use a client to push into Solr. > > > On May 22, 2009, at 3:42 AM, Jianbin Dai wrote: > >> >> about 2.8 m total docs were created. only the first run finishes. In my >> 2nd try, it hangs there forever at the end of indexing, (I guess right >> before commit), with cpu usage of 100%. Total 5G (2050) index files are >> created. Now I have two problems: >> 1. why it hangs there and failed? >> 2. how can i speed up the indexing? >> >> >> Here is my solrconfig.xml >> >> <useCompoundFile>false</useCompoundFile> >> <ramBufferSizeMB>3000</ramBufferSizeMB> >> <mergeFactor>1000</mergeFactor> >> <maxMergeDocs>2147483647</maxMergeDocs> >> <maxFieldLength>10000</maxFieldLength> >> <unlockOnStartup>false</unlockOnStartup> >> >> >> >> >> --- On Thu, 5/21/09, Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com> >> wrote: >> >>> From: Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com> >>> Subject: Re: How to index large set data >>> To: solr-user@lucene.apache.org >>> Date: Thursday, May 21, 2009, 10:39 PM >>> what is the total no:of docs created >>> ? I guess it may not be memory >>> bound. indexing is mostly amn IO bound operation. You may >>> be able to >>> get a better perf if a SSD is used (solid state disk) >>> >>> On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai <djian...@yahoo.com> >>> wrote: >>>> >>>> Hi Paul, >>>> >>>> Thank you so much for answering my questions. It >>> >>> really helped. >>>> >>>> After some adjustment, basically setting mergeFactor >>> >>> to 1000 from the default value of 10, I can finished the >>> whole job in 2.5 hours. I checked that during running time, >>> only around 18% of memory is being used, and VIRT is always >>> 1418m. I am thinking it may be restricted by JVM memory >>> setting. But I run the data import command through web, >>> i.e., >>>> >>> http://<host>:<port>/solr/dataimport?command=full-import, >>> how can I set the memory allocation for JVM? >>>> >>>> Thanks again! >>>> >>>> JB >>>> >>>> --- On Thu, 5/21/09, Noble Paul നോബിള് >>> >>> नोब्ळ् <noble.p...@corp.aol.com> >>> wrote: >>>> >>>>> From: Noble Paul നോബിള് >>> >>> नोब्ळ् <noble.p...@corp.aol.com> >>>>> >>>>> Subject: Re: How to index large set data >>>>> To: solr-user@lucene.apache.org >>>>> Date: Thursday, May 21, 2009, 9:57 PM >>>>> check the status page of DIH and see >>>>> if it is working properly. and >>>>> if, yes what is the rate of indexing >>>>> >>>>> On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai >>> >>> <djian...@yahoo.com> >>>>> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have about 45GB xml files to be indexed. I >>> >>> am using >>>>> >>>>> DataImportHandler. I started the full import 4 >>> >>> hours ago, >>>>> >>>>> and it's still running.... >>>>>> >>>>>> My computer has 4GB memory. Any suggestion on >>> >>> the >>>>> >>>>> solutions? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> JB >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>> ----------------------------------------------------- >>>>> >>>>> Noble Paul | Principal Engineer| AOL | http://aol.com >>>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> ----------------------------------------------------- >>> Noble Paul | Principal Engineer| AOL | http://aol.com >>> >> >> >> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > -- ----------------------------------------------------- Noble Paul | Principal Engineer| AOL | http://aol.com