no need to use embedded Solrserver. you can use SolrJ with streaming in multiple threads
On Fri, May 22, 2009 at 8:36 PM, Jianbin Dai <djian...@yahoo.com> wrote: > > If I do the xml parsing by myself and use embedded client to do the push, > would it be more efficient than DIH? > > > --- On Fri, 5/22/09, Grant Ingersoll <gsing...@apache.org> wrote: > >> From: Grant Ingersoll <gsing...@apache.org> >> Subject: Re: How to index large set data >> To: solr-user@lucene.apache.org >> Date: Friday, May 22, 2009, 5:38 AM >> Can you parallelize this? I >> don't know that the DIH can handle it, >> but having multiple threads sending docs to Solr is the >> best >> performance wise, so maybe you need to look at alternatives >> to pulling >> with DIH and instead use a client to push into Solr. >> >> >> On May 22, 2009, at 3:42 AM, Jianbin Dai wrote: >> >> > >> > about 2.8 m total docs were created. only the first >> run finishes. In >> > my 2nd try, it hangs there forever at the end of >> indexing, (I guess >> > right before commit), with cpu usage of 100%. Total 5G >> (2050) index >> > files are created. Now I have two problems: >> > 1. why it hangs there and failed? >> > 2. how can i speed up the indexing? >> > >> > >> > Here is my solrconfig.xml >> > >> > >> <useCompoundFile>false</useCompoundFile> >> > >> <ramBufferSizeMB>3000</ramBufferSizeMB> >> > >> <mergeFactor>1000</mergeFactor> >> > >> <maxMergeDocs>2147483647</maxMergeDocs> >> > >> <maxFieldLength>10000</maxFieldLength> >> > >> <unlockOnStartup>false</unlockOnStartup> >> > >> > >> > >> > >> > --- On Thu, 5/21/09, Noble Paul >> നോബിള് नो >> > ब्ळ् <noble.p...@corp.aol.com> >> wrote: >> > >> >> From: Noble Paul നോബിള് >> नोब्ळ् >> >> <noble.p...@corp.aol.com> >> >> Subject: Re: How to index large set data >> >> To: solr-user@lucene.apache.org >> >> Date: Thursday, May 21, 2009, 10:39 PM >> >> what is the total no:of docs created >> >> ? I guess it may not be memory >> >> bound. indexing is mostly amn IO bound operation. >> You may >> >> be able to >> >> get a better perf if a SSD is used (solid state >> disk) >> >> >> >> On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai >> <djian...@yahoo.com> >> >> wrote: >> >>> >> >>> Hi Paul, >> >>> >> >>> Thank you so much for answering my questions. >> It >> >> really helped. >> >>> After some adjustment, basically setting >> mergeFactor >> >> to 1000 from the default value of 10, I can >> finished the >> >> whole job in 2.5 hours. I checked that during >> running time, >> >> only around 18% of memory is being used, and VIRT >> is always >> >> 1418m. I am thinking it may be restricted by JVM >> memory >> >> setting. But I run the data import command through >> web, >> >> i.e., >> >>> >> >> >> http://<host>:<port>/solr/dataimport?command=full-import, >> >> how can I set the memory allocation for JVM? >> >>> Thanks again! >> >>> >> >>> JB >> >>> >> >>> --- On Thu, 5/21/09, Noble Paul >> നോബിള് >> >> नोब्ळ् <noble.p...@corp..aol.com> >> >> wrote: >> >>> >> >>>> From: Noble Paul നോബിള് >> >> नोब्ळ् <noble.p...@corp.aol.com> >> >>>> Subject: Re: How to index large set data >> >>>> To: solr-user@lucene.apache.org >> >>>> Date: Thursday, May 21, 2009, 9:57 PM >> >>>> check the status page of DIH and see >> >>>> if it is working properly. and >> >>>> if, yes what is the rate of indexing >> >>>> >> >>>> On Thu, May 21, 2009 at 11:48 AM, Jianbin >> Dai >> >> <djian...@yahoo.com> >> >>>> wrote: >> >>>>> >> >>>>> Hi, >> >>>>> >> >>>>> I have about 45GB xml files to be >> indexed. I >> >> am using >> >>>> DataImportHandler. I started the full >> import 4 >> >> hours ago, >> >>>> and it's still running.... >> >>>>> My computer has 4GB memory. Any >> suggestion on >> >> the >> >>>> solutions? >> >>>>> Thanks! >> >>>>> >> >>>>> JB >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> >> >> >> ----------------------------------------------------- >> >>>> Noble Paul | Principal Engineer| AOL | http://aol.com >> >>>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> >> ----------------------------------------------------- >> >> Noble Paul | Principal Engineer| AOL | http://aol.com >> >> >> > >> > >> > >> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem >> (Lucene/Solr/Nutch/Mahout/Tika/Droids) >> using Solr/Lucene: >> http://www.lucidimagination..com/search >> >> > > > > > -- ----------------------------------------------------- Noble Paul | Principal Engineer| AOL | http://aol.com