Hi otis, I understand that is slightly off track question, but I am just curious to know the performance of Search on a 20 GB index file. What has been your observation?
Regards, Eswar On Nov 21, 2007 12:33 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Mike is right about the occasional slow-down, which appears as a pause and > is due to large Lucene index segment merging. This should go away with > newer versions of Lucene where this is happening in the background. > > That said, we just indexed about 20MM documents on a single 8-core machine > with 8 GB of RAM, resulting in nearly 20 GB index. The whole process took a > little less than 10 hours - that's over 550 docs/second. The vanilla > approach before some of our changes apparently required several days to > index the same amount of data. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > From: Mike Klaas <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Monday, November 19, 2007 5:50:19 PM > Subject: Re: Any tips for indexing large amounts of data? > > There should be some slowdown in larger indices as occasionally large > segment merge operations must occur. However, this shouldn't really > affect overall speed too much. > > You haven't really given us enough data to tell you anything useful. > I would recommend trying to do the indexing via a webapp to eliminate > all your code as a possible factor. Then, look for signs to what is > happening when indexing slows. For instance, is Solr high in cpu, is > the computer thrashing, etc? > > -Mike > > On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: > > > Hi, > > > > Thanks for answering this question a while back. I have made some > > of the suggestions you mentioned. ie not committing until I've > > finished indexing. What I am seeing though, is as the index get > > larger (around 1Gb), indexing is taking a lot longer. In fact it > > slows down to a crawl. Have you got any pointers as to what I might > > be doing wrong? > > > > Also, I was looking at using MultiCore solr. Could this help in > > some way? > > > > Thank you > > Brendan > > > > On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: > > > >> > >> : I would think you would see better performance by allowing auto > >> commit > >> : to handle the commit size instead of reopening the connection > >> all the > >> : time. > >> > >> if your goal is "fast" indexing, don't use autoCommit at all ... > just > >> index everything, and don't commit until you are completely done. > >> > >> autoCommitting will slow your indexing down (the benefit being > >> that more > >> results will be visible to searchers as you proceed) > >> > >> > >> > >> > >> -Hoss > >> > > > > > > >