Re: How to index large set data

Otis Gospodnetic Fri, 22 May 2009 09:28:41 -0700

If the file numbers and index size was increasing, that means Solr was still 
working.  It's possible it's taking extra long because of such high settings.  
Bring them both down and try.  For example, don't go over 20 with mergeFactor, 
and try just 1GB for ramBufferSizeMB.



Bona fortuna!

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Jianbin Dai <djian...@yahoo.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, May 22, 2009 11:05:27 AM
> Subject: Re: How to index large set data
> 
> 
> I dont know exactly what is this 3G Ram buffer used. But what I noticed was 
> both 
> index size and file number were keeping increasing, but stuck in the commit. 
> 
> --- On Fri, 5/22/09, Otis Gospodnetic wrote:
> 
> > From: Otis Gospodnetic 
> > Subject: Re: How to index large set data
> > To: solr-user@lucene.apache.org
> > Date: Friday, May 22, 2009, 7:26 AM
> > 
> > Hi,
> > 
> > Those settings are a little "crazy".  Are you sure you
> > want to give Solr/Lucene 3G to buffer documents before
> > flushing them to disk?  Are you sure you want to use
> > the mergeFactor of 1000?  Checking the logs to see if
> > there are any errors.  Look at the index directory to
> > see if Solr is actually still writing to it? (file sizes are
> > changing, number of files is changing).  kill -QUIT the
> > JVM pid to see where things are "stuck" if they are
> > stuck...
> > 
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > ----- Original Message ----
> > > From: Jianbin Dai 
> > > To: solr-user@lucene.apache.org;
> > noble.p...@gmail.com
> > > Sent: Friday, May 22, 2009 3:42:04 AM
> > > Subject: Re: How to index large set data
> > > 
> > > 
> > > about 2.8 m total docs were created. only the first
> > run finishes. In my 2nd try, 
> > > it hangs there forever at the end of indexing, (I
> > guess right before commit), 
> > > with cpu usage of 100%. Total 5G (2050) index files
> > are created. Now I have two 
> > > problems:
> > > 1. why it hangs there and failed?
> > > 2. how can i speed up the indexing?
> > > 
> > > 
> > > Here is my solrconfig.xml
> > > 
> > >     false
> > >     3000
> > >     1000
> > >     2147483647
> > >     10000
> > >     false
> > > 
> > > 
> > > 
> > > 
> > > --- On Thu, 5/21/09, Noble Paul
> > നോബിള്‍  नोब्ळ् wrote:
> > > 
> > > > From: Noble Paul നോബിള്‍ 
> > नोब्ळ् 
> > > > Subject: Re: How to index large set data
> > > > To: solr-user@lucene.apache.org
> > > > Date: Thursday, May 21, 2009, 10:39 PM
> > > > what is the total no:of docs created
> > > > ?  I guess it may not be memory
> > > > bound. indexing is mostly amn IO bound operation.
> > You may
> > > > be able to
> > > > get a better perf if a SSD is used (solid state
> > disk)
> > > > 
> > > > On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai 
> > > > wrote:
> > > > >
> > > > > Hi Paul,
> > > > >
> > > > > Thank you so much for answering my
> > questions. It
> > > > really helped.
> > > > > After some adjustment, basically setting
> > mergeFactor
> > > > to 1000 from the default value of 10, I can
> > finished the
> > > > whole job in 2.5 hours. I checked that during
> > running time,
> > > > only around 18% of memory is being used, and VIRT
> > is always
> > > > 1418m. I am thinking it may be restricted by JVM
> > memory
> > > > setting. But I run the data import command
> > through web,
> > > > i.e.,
> > > > >
> > > > http://:/solr/dataimport?command=full-import,
> > > > how can I set the memory allocation for JVM?
> > > > > Thanks again!
> > > > >
> > > > > JB
> > > > >
> > > > > --- On Thu, 5/21/09, Noble Paul
> > നോബിള്‍
> > > >  नोब्ळ् 
> > > > wrote:
> > > > >
> > > > >> From: Noble Paul നോബിള്‍
> > > >  नोब्ळ् 
> > > > >> Subject: Re: How to index large set
> > data
> > > > >> To: solr-user@lucene.apache.org
> > > > >> Date: Thursday, May 21, 2009, 9:57 PM
> > > > >> check the status page of DIH and see
> > > > >> if it is working properly. and
> > > > >> if, yes what is the rate of indexing
> > > > >>
> > > > >> On Thu, May 21, 2009 at 11:48 AM,
> > Jianbin Dai
> > > > 
> > > > >> wrote:
> > > > >> >
> > > > >> > Hi,
> > > > >> >
> > > > >> > I have about 45GB xml files to be
> > indexed. I
> > > > am using
> > > > >> DataImportHandler. I started the full
> > import 4
> > > > hours ago,
> > > > >> and it's still running.....
> > > > >> > My computer has 4GB memory. Any
> > suggestion on
> > > > the
> > > > >> solutions?
> > > > >> > Thanks!
> > > > >> >
> > > > >> > JB
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > >
> > -----------------------------------------------------
> > > > >> Noble Paul | Principal Engineer| AOL |
> > http://aol.com
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > >
> > -----------------------------------------------------
> > > > Noble Paul | Principal Engineer| AOL | http://aol.com
> > > > 
> > 
> >

Re: How to index large set data

Reply via email to