Hi otis,

I understand that is slightly off track question, but I am just curious to
know the performance of Search on a 20 GB index file. What has been your
observation?

Regards,
Eswar

On Nov 21, 2007 12:33 PM, Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:

> Mike is right about the occasional slow-down, which appears as a pause and
> is due to large Lucene index segment merging.  This should go away with
> newer versions of Lucene where this is happening in the background.
>
> That said, we just indexed about 20MM documents on a single 8-core machine
> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took a
> little less than 10 hours - that's over 550 docs/second.  The vanilla
> approach before some of our changes apparently required several days to
> index the same amount of data.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> ----- Original Message ----
> From: Mike Klaas <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Monday, November 19, 2007 5:50:19 PM
> Subject: Re: Any tips for indexing large amounts of data?
>
> There should be some slowdown in larger indices as occasionally large
> segment merge operations must occur.  However, this shouldn't really
> affect overall speed too much.
>
> You haven't really given us enough data to tell you anything useful.
> I would recommend trying to do the indexing via a webapp to eliminate
> all your code as a possible factor.  Then, look for signs to what is
> happening when indexing slows.  For instance, is Solr high in cpu, is
> the computer thrashing, etc?
>
> -Mike
>
> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
>
> > Hi,
> >
> > Thanks for answering this question a while back. I have made some
> > of the suggestions you mentioned. ie not committing until I've
> > finished indexing. What I am seeing though, is as the index get
> > larger (around 1Gb), indexing is taking a lot longer. In fact it
> > slows down to a crawl. Have you got any pointers as to what I might
> > be doing wrong?
> >
> > Also, I was looking at using MultiCore solr. Could this help in
> > some way?
> >
> > Thank you
> > Brendan
> >
> > On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
> >
> >>
> >> : I would think you would see better performance by allowing auto
> >> commit
> >> : to handle the commit size instead of reopening the connection
> >> all the
> >> : time.
> >>
> >> if your goal is "fast" indexing, don't use autoCommit at all ...
>  just
> >> index everything, and don't commit until you are completely done.
> >>
> >> autoCommitting will slow your indexing down (the benefit being
> >> that more
> >> results will be visible to searchers as you proceed)
> >>
> >>
> >>
> >>
> >> -Hoss
> >>
> >
>
>
>
>
>

Reply via email to