Thank you for your response. I did not make the StreamingUpdate application 
yet, but I did change the other settings that you mentioned. It gave me a huge 
boost in indexing speed. (I am still using post.sh but hope to change that 
soon).

One thing I noticed is the indexing speed was incredibly fast last night, but 
today the commits are taking so long. Is this to be expected?



-- 
Best Regards,

Charles Wardell
Blue Chips Technology, Inc.
www.bcsolution.com

On Wednesday, April 27, 2011 at 6:15 PM, Otis Gospodnetic wrote: 
> Hi Charles,
> 
> Yes, the threads I was referring to are in the context of the client/indexer, 
> so 
> one of the params for StreamingUpdateSolrServer.
> post.sh/jar are just there because they are handy. Don't use them for 
> production.
> 
> It's impossible to tell how long indexing of 100M documents may take. They 
> could be very big or very small. You could perform very light or no analysis 
> or 
> heavy analysis. They could contain 1 or 100 fields. :)
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> ----- Original Message ----
> > From: Charles Wardell <charles.ward...@bcsolution.com>
> > To: solr-user@lucene.apache.org
> > Sent: Tue, April 26, 2011 8:01:28 PM
> > Subject: Re: Question on Batch process
> > 
> > Thank you Otis.
> > Without trying to appear to stupid, when you refer to having the params 
> > matching your # of CPU cores, you are talking about the # of threads I can 
> > spawn with the StreamingUpdateSolrServer object?
> > Up until now, I have been just utilizing post.sh or post.jar. Are these 
> > capable of that or do I need to write some code to collect a bunch of files 
> > into the buffer and send it off?
> > 
> > Also, Do you have a sense for how long it should take to index 100,000 
> > files 
> > or in my case 100,000,000 documents?
> > StreamingUpdateSolrServer
> > public StreamingUpdateSolrServer(String solrServerUrl, int queueSize, int 
> > threadCount) throws MalformedURLException
> > 
> > Thanks again,
> > Charlie
> > 
> > -- 
> > Best Regards,
> > 
> > Charles Wardell
> > Blue Chips Technology, Inc.
> > www.bcsolution.com
> > 
> > On Tuesday, April 26, 2011 at 5:12 PM, Otis Gospodnetic wrote: 
> > > Charlie,
> > > 
> > > How's this:
> > > * -Xmx2g
> > > * ramBufferSizeMB 512
> > > * mergeFactor 10 (default, but you could up it to 20, 30, if ulimit -n 
> > allows)
> > > * ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB
> > > * use SolrStreamingUpdateServer (with params matching your number of CPU 
> > cores) 
> > 
> > > or send batches of say 1000 docs with the other SolrServer impl using N 
> > threads 
> > 
> > > (N=# of your CPU cores)
> > > 
> > > Otis
> > >  ----
> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > > Lucene ecosystem search :: http://search-lucene.com/
> > > 
> > > 
> > > 
> > > ----- Original Message ----
> > > > From: Charles Wardell <charles.ward...@bcsolution.com>
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Tue, April 26, 2011 2:32:29 PM
> > > > Subject: Question on Batch process
> > > > 
> > > > I am sure that this question has been asked a few times, but I can't 
> > > > seem 
> > to 
> > 
> > > > find the sweetspot for indexing.
> > > > 
> > > > I have about 100,000 files each containing 1,000 xml documents ready to 
> > > > be 
> > 
> > > > posted to Solr. My desire is to have it index as quickly as possible 
> > > > and 
> > then 
> > 
> > > > once completed the daily stream of ADDs will be small in comparison.
> > > > 
> > > > The individual documents are small. Essentially web postings from the 
> > > > net. 
> > 
> > > > Title, postPostContent, date. 
> > > > 
> > > > 
> > > >  What would be the ideal configuration? For RamBufferSize, mergeFactor, 
> > > > MaxbufferedDocs, etc..
> > > > 
> > > > My machine is a quad core hyper-threaded. So it shows up as 8 cpu's in 
> TOP
> > > > I have 16GB of available ram.
> > > > 
> > > > 
> > > > Thanks in advance.
> > > > Charlie
> 

Reply via email to