> - As per > http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf Sorry, the presentation covers a lot of ground: see slide #20: "Standard thread pools can have high contention for task queue and other data structures when used with fine-grained tasks" [I haven't yet implemented work stealing]
-glen 2009/4/9 Glen Newton <glen.new...@gmail.com>: > For Solr / Lucene: > - use -XX:+AggressiveOpts > - If available, huge pages can help. See > http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html > I haven't yet followed-up with my Lucene performance numbers using > huge pages: it is 10-15% for large indexing jobs. > > For Lucene: > - multi-thread using java.util.concurrent.ThreadPoolExecutor > (http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html > 6.4 million full-text article + metadata indexed resulting in 83GB > index; these are old number: things are down to ~10hours now) > - while multithreading on multicore is particularly good, it also > improves performance on single core, for small (<6 YMMV) numbers of > threads & good I/O (test for your particular configuration) > - Use multiple indexes & merge at the end > - As per http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf > use separate ThreadPoolExecutor per index in previous, reducing queue > contention. This is giving me an additional ~10%. I will blog about > this in the near future... > > -glen > > 2009/4/9 sunnyfr <johanna...@gmail.com>: >> >> Hi Otis, >> How did you manage that? I've 8 core machine with 8GB of ram and 11GB index >> for 14M docs and 50000 update every 30mn but my replication kill everything. >> My segments are merged too often sor full index replicate and cache lost and >> .... I've no idea what can I do now? >> Some help would be brilliant, >> btw im using Solr 1.4. >> >> Thanks, >> >> >> Otis Gospodnetic wrote: >>> >>> Mike is right about the occasional slow-down, which appears as a pause and >>> is due to large Lucene index segment merging. This should go away with >>> newer versions of Lucene where this is happening in the background. >>> >>> That said, we just indexed about 20MM documents on a single 8-core machine >>> with 8 GB of RAM, resulting in nearly 20 GB index. The whole process took >>> a little less than 10 hours - that's over 550 docs/second. The vanilla >>> approach before some of our changes apparently required several days to >>> index the same amount of data. >>> >>> Otis >>> -- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> ----- Original Message ---- >>> From: Mike Klaas <mike.kl...@gmail.com> >>> To: solr-user@lucene.apache.org >>> Sent: Monday, November 19, 2007 5:50:19 PM >>> Subject: Re: Any tips for indexing large amounts of data? >>> >>> There should be some slowdown in larger indices as occasionally large >>> segment merge operations must occur. However, this shouldn't really >>> affect overall speed too much. >>> >>> You haven't really given us enough data to tell you anything useful. >>> I would recommend trying to do the indexing via a webapp to eliminate >>> all your code as a possible factor. Then, look for signs to what is >>> happening when indexing slows. For instance, is Solr high in cpu, is >>> the computer thrashing, etc? >>> >>> -Mike >>> >>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: >>> >>>> Hi, >>>> >>>> Thanks for answering this question a while back. I have made some >>>> of the suggestions you mentioned. ie not committing until I've >>>> finished indexing. What I am seeing though, is as the index get >>>> larger (around 1Gb), indexing is taking a lot longer. In fact it >>>> slows down to a crawl. Have you got any pointers as to what I might >>>> be doing wrong? >>>> >>>> Also, I was looking at using MultiCore solr. Could this help in >>>> some way? >>>> >>>> Thank you >>>> Brendan >>>> >>>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: >>>> >>>>> >>>>> : I would think you would see better performance by allowing auto >>>>> commit >>>>> : to handle the commit size instead of reopening the connection >>>>> all the >>>>> : time. >>>>> >>>>> if your goal is "fast" indexing, don't use autoCommit at all ... >>> just >>>>> index everything, and don't commit until you are completely done. >>>>> >>>>> autoCommitting will slow your indexing down (the benefit being >>>>> that more >>>>> results will be visible to searchers as you proceed) >>>>> >>>>> >>>>> >>>>> >>>>> -Hoss >>>>> >>>> >>> >>> >>> >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > > - > -- -